Harmonizing existing climate change mitigation policy datasets with a hybrid machine learning approach

Harmonizing existing climate change mitigation policy datasets with a hybrid machine learning approach

Play all audios:

Loading...

ABSTRACT With the rapid proliferation of climate policies in both number and scope, there is an increasing demand for a global-level dataset that provides multi-indicator information on


policy elements and their implementation contexts. To address this need, we developed the Global Climate Change Mitigation Policy Dataset (GCCMPD) using a semisupervised hybrid machine


learning approach, drawing upon policy information from global, regional, and sector-specific sources. Differing from existing climate policy datasets, the GCCMPD covers a large range of


policies, amounting to 73,625 policies of 216 entities. Through the integration of expert knowledge-based dictionary mapping, probability statistics methods, and advanced natural language


processing technology, the GCCMPD offers detailed classification of multiple indicators and consistent information on sectoral policy instruments. This includes insights into objectives,


target sectors, instruments, legal compulsion, administrative entities, etc. By aligning with the sector classification of the Intergovernmental Panel on Climate Change (IPCC) emission


datasets, the GCCMPD serves to help policy-makers, researchers, and social organizations gain a deeper understanding of the similarities and distinctions among climate activities across


countries, sectors, and entities. SIMILAR CONTENT BEING VIEWED BY OTHERS MACHINE LEARNING MAP OF CLIMATE POLICY LITERATURE REVEALS DISPARITIES BETWEEN SCIENTIFIC ATTENTION, POLICY DENSITY,


AND EMISSIONS Article Open access 11 February 2025 MEASURING CHINA’S POLICY STRINGENCY ON CLIMATE CHANGE FOR 1954–2022 Article Open access 31 January 2025 A COMPUTATIONAL APPROACH TO


ANALYZING CLIMATE STRATEGIES OF CITIES PLEDGING NET ZERO Article Open access 26 August 2022 BACKGROUND & SUMMARY With the first World Climate Conference in 1979 marking the start of an


international focus on climate change issues, the number and scope of climate-related policies have increased substantially. At the global level, an international governance scheme


formulated the Kyoto Protocol in 1997 and the Paris Agreement in 2015 to set up national mitigation targets and ancillary mechanisms. National, subnational and sectoral policies have also


flourished in recent years, with some local consideration of other objectives such as environmental protection, economic development, equity thinking and sustainable development.


Quantitative policy studies have emerged accordingly, either assessing the specific policy effect empirically or simulating the policy impacts virtually. The literature focusing on the


analysis of a single policy typically involves only a single country (or even a specific industry in a certain country)1,2,3,4,5,6 or a comparison of a few countries7,8, thus requiring a


limited amount of policy data. A more detailed and consistent climate policy dataset enables large-scale quantitative analysis of global climate policies and provides new information for


policy-makers. For example, through detailed classification based on direct and indirect laws, the differences in the coverage scope of greenhouse gas targets can be analysed9. Through the


classification of instruments, various patterns of policy convergence and divergence across countries and the driving factors behind them can be compared10. On the other hand, the


differences in the circumstances of policy formulation and implementation are driving wide variations with similar policy instruments, which have not been fully captured for better


evaluating the policy effects. In addition, some rising concerns, such as the constraints of policy adoption in developing economies, the synergistic effects of the policy mix, and the


spillover effects of market-based instruments across regions, call for more comparative studies with global or cross-regional perspectives. Such demands in policy science require a more


finely categorized, multi-indicator climate policy dataset that allows for convincing investigation of policy design under the premise of clarifying the implementation context. Furthermore,


the climate policy literature has diversified significantly with the emergence of new climate practices and the availability of more advanced data processing methods. According to SciVal


(see Supplementary Information (SI1) for detailed search terms), the number of articles in climate policy-related research areas has been increasing annually since 1997, reaching 6,946


articles in 2022. Concurrently, the number of topics within the global climate policy research domain has also expanded, rising from 499 in 2018 to 751 in 2022 (Supplementary Figs. 1-2).


Climate policy evolution represents one of the emerging branches within this research domain, encompassing various aspects such as policy coverage9,11,12, factors influencing policy


adoption10,13,14, policy diffusion15,16, and policy themes and trends17,18,19,20,21. The utilization of policy quantity as the primary variable has broadened the scope of research on policy


effects and their relationship with greenhouse gases22,23,24. At the same time, there has been a gradual shift towards more detailed and consistent exploration of policy design and


comparison, adopting a mixed policy perspective25,26,27,28,29,30. In addition, research on the trade-offs between multiple goals31,32,33,34 also requires the support of multi-indicator


policy datasets. To the best of our knowledge, the existing global climate policy datasets are still not sufficient to meet the above research demands. The following points are taken as an


example. (1) The lack of policy instruments for specific sectors prevents in-depth research on the policy mix for specific sectors. The lack of linkage between sectoral policies and sectoral


carbon emissions, coupled with the varying focus of existing datasets on sectors35, further compounds these limitations. (2) Variations in policy versions and coverage can also undermine


the robustness of research that relies on policy density as a metric35. (3) The separation of the law from a large number of supportive and weaker policies will also cause the role of


supportive policies to be ignored36. The above shortcomings can be summarized as follows. First, the numbers and scopes of policies are still not complete, and updates cannot be made in time


due to manual methods of data collection and processing. Second, there is a lack of consistent and more detailed policy categorization in comparable standards, partly due to the overlapping


scope of sectors, complexity of entities involved, and difficulty in identifying indirect policies. These limitations also make it impractical to conduct comparative analysis for the


identification of additional features across datasets. Additionally, studies comparing existing mainstream global datasets have shown that differences in data coverage and inconsistent


classification standards lead to inconsistent results when using different datasets35. Finally, legal enforcement characteristics and information about administrative entities are crucial in


driving changes in policy design. The aforementioned features may impact the practical outcomes of policy implementation and serve as a complement to policy ambitions35,36. We attempted to


construct the Global Climate Change Mitigation Policy Dataset (GCCMPD) to fill the gap between the existing climate policy databsets and the increasing demand for research by harmonizing the


existing datasets with a hybrid machine learning approach. The GCCMPD currently includes 73,625 policies of 216 entities, with a substantial increase in the numbers and scopes of policies


through a complete search of available sources. For each policy in the dataset, we extracted important policy features such as target sector, policy instrument, objective, binding force,


executive/legislative, jurisdiction, etc. We mainly used semisupervised machine learning and combined various natural language processing methods to achieve consistent classification of each


feature of the policy, which enhances the objectivity and extensibility of the data. Compared with the existing datasets, our dataset can better respond to the demands of empirical studies


and cross-national comparisons of policies. First, our dataset can be used to build more accurate policy indicators. On the one hand, our dataset includes more policy records than any


individual dataset and has fewer omitted policies, which can ease the underestimation of the efforts of the entities to mitigate climate change. Coordinated climate policies can also enrich


single or small n-case country case studies30,35,37. On the other hand, our dataset provides features of the policies, including the binding force, the executive/legislative and the


objectives. Since the differences in the circumstances of policy formulation and implementation can vary greatly among policies with different binding forces (for example, laws and


preparative instruments) and strongly influence the output of the policies, future studies can weigh the number of policies with their binding force or add the number of soft and hard laws


into their model as two separate variables rather than simply counting the number of policies23,24,35,36. Second, our dataset enables us to measure the output of sectoral policies. Our


dataset categorizes the policies into sectors according to the IPCC Sixth Assessment Report (AR6) standard and can be easily linked to emission datasets with the same standard, such as the


Emissions Database for Global Atmospheric Research (EDGAR). Thus, with the sectoral emission data from these datasets and climate policy indicators calculated by sector instruments with our


dataset, the sectoral emission reduction performance of the policy system (rather than the emission reduction performance of single policies38,39,40,41) can be measured. Third, the policy


similarity information of our dataset supports a better exploration of climate policy diffusion. We provide information on the similarity of the context of climate policies, facilitating the


inference of policy diffusion using both temporal and contextual data. This approach has the potential to outperform existing studies42, which primarily rely on inferring the probability of


diffusion from one entity to another based solely on the time gap between the adoption of climate policies in the two entities. Finally, our dataset includes information on the binding


force and policy objectives, enabling studies to explore the relationship between these features and other factors. For example, researchers can investigate how various conditions, such as


economic or geographic factors, may influence the binding force of policies, as well as the differences in policy outcomes associated with different levels of binding force. Moreover, our


dataset enables studies to focus on the dynamics between hard law and soft law, such as examining how soft law transitions into hard law and to what extent different types of laws influence


emissions43,44. METHODS We constructed the GCCMPD using a multilevel process and framework (Fig. 1). Specifically, this entailed source identification and selection of policy data, selection


of policy characteristic indicators, data collection, data processing, manual checks, verification, annotation, dataset expansion and some possible uses. The final data are also stored in


different ways to facilitate various types of users. DATA IDENTIFICATION Before identifying the data source, it is important to define the boundaries of the dataset described in this paper.


At present, there is no clear definition of climate policy, and existing attempts to define climate policy have adopted very broad and vague scopes20. The reason is that climate policy


covers a wide range of areas. For example, greenhouse gases and pollutants have common sources45, resulting in considerable overlap between climate policy and energy and environmental


policy. Sometimes climate policy is directly defined as policies to reduce greenhouse gases and air pollution. Our purpose is to construct a detailed classification of multiple-indicator


multipurpose datasets, so we use a relatively broad definition to define climate policy as a set of laws, regulations, strategies, and other measures related to climate change, which can be


further divided into climate mitigation and climate adaptation policies. Dataset users can filter by attributes such as sources to obtain more targeted subsets of data, despite the broad


boundaries of the dataset. According to the above definition, the GCCMPD roughly consists of three sources (see Table 1). First, global datasets, the Climate Change Laws of the World (CCLW),


the International Energy Agency (IEA) Policies and Measures Dataset, and the Climate Policy dataset (CP) were used. Second, regional datasets, such as the Asia Pacific Energy Portal (APEP)


and the European Environment Agency (EEA), cover a certain area and have a considerable number of policies. The third category is datasets that focus on specialized fields such as


technology, policy instruments, and legislation. These data are extremely specialized but narrow in scope. For the ECOLEX dataset, we selected climate-related subjects (see Supplementary


Table 1 for details). We selected IEA, CP, and CCLW as the training sets for manual annotation, accounting for approximately 16% of all data sources (excluding the legal part of ECOLEX,


which accounts for approximately 70%). Since the GCCMPD is a multi-indicator dataset, it is appropriate to choose authoritative and diverse data sources, and these three datasets have


gradually been recognized by the academic community18,22,23. We first harmonized the three datasets to form a “core” dataset via the natural language processing technique and manual


checking. The “core” dataset can support the demand of analysing global climate mitigation policies since these three datasets cover various types of climate policies (laws, commitment


agreements, goals; national level, subnational level) from almost all of the entities in the world, which can basically guarantee the diversity and representativeness of the core dataset.


Additionally, we noticed that the data coverage of the “core” dataset can be further improved by including data from other data sources. Then, we used the machine learning models trained on


the manually checked core dataset to label data from other sources and formed a full dataset. The IEA, CP, and CCLW datasets have great potential to complement each other in terms of country


coverage, sector coverage, laws and nonlaws, policy versions, policy scope, etc. CCLW is a specialized climate law dataset that collects climate laws or equally important policies23, with


the highest policy importance but a relatively small number of policies. The IEA was the first of the three datasets to collect policy data on energy-related policies in the building sector.


CP is a synthesized dataset containing policies from other datasets (including CCLW and IEA) and various reporting sources, characterized by detailed information on policy targets9,12,


instruments and sectors11. After harmonizing with the IEA and CP datasets, the following complements were formed: (1) subnational policies and early policy versions; (2) nonlegal supportive


“weaker” policies; (3) specific sector policies (such as the building sector); and (4) more balanced national policies. A detailed comparison of the main characteristics of the three


datasets is given in Supplementary Table 2. The main purpose of comparison is to find which classification information (the fields in bold in the table) of these three datasets can help us


classify. INDICATOR SELECTION AND DESCRIPTION The GCCMPD aims to provide a multi-indicator dataset that can be researched and consulted by scholars in various fields, policy-makers, and


various social organizations. We mainly chose the following indicators to meet different research needs: sector, instrument, objective, legally binding force, enforcement settings and


jurisdiction. Due to the diverse sources of the GCCMPD, choosing a unified standard and definition is conducive to maintaining the consistency of the dataset. * SECTOR: One of the


characteristics of the GCCMPD is that it can be closely integrated with the carbon emission dataset, which requires sectoral classification of policies to match them with GHG emissions. Most


mainstream GHG emissions datasets46,47 refer to the IPCC 2006 reporting guidelines48 for sectoral classification and generally attribute global GHG emission sources to five broad sectors49


of energy systems, industry, buildings, transport and AFOLU (agriculture, forestry and other land uses). For the above reasons, the GCCMPD also divides sectors into these five sectors,


adopting the same sectoral classification standards as the carbon emissions dataset based on the EDGAR50, and the classification standards and descriptions are shown in Supplementary Table 


3. * INSTRUMENT: A taxonomy of policy instruments can help policy-makers formulate policy packages, provide possible options for policy practice, and better evaluate climate policies. Some


taxonomies, such as the NATO scheme51 (A widely-used standard that categorizes policies based on the types of governing resources they rely on: Nodality, Authority, Treasure, and


Organization.), represent general criteria and are less specific to a certain domain of climate mitigation. We chose the classification standard of the IPCC Fifth Assessment Report (AR5)52,


which has the following advantages. First, the classification is authoritative. The classification criteria are summarized by climate experts based on a large amount of literature,


reflecting the climate policies that the academic community focuses on. Second, after the modification from the Third Assessment Report (TAR)53 to the AR5, the classification has strong


completeness and timeliness. Third, there are sector-level classifications of policy instruments for the above five sectors, which can help to understand and use them in conjunction with


sectoral classifications. As a result, policy instruments are divided into seven categories: taxes, tradable allowances, subsidies, government provisions of public goods or services,


information programmes, regulatory approaches, and voluntary actions. Detailed sector-level standards and examples can be found in Supplementary Table 4. * OBJECTIVE: The mitigation of


climate change is only one of many public issues, and sometimes it is necessary to judge whether climate change has co-benefits and adverse side effects on society, the economy, and the


environment. For almost the same reasons as “instruments”, we chose the AR5 taxonomy, which categorizes climate mitigation impacts on economic, social, and environmental objectives/issues.


The detailed subobjective classification can be found in Supplementary Table 5. * BINDING FORCE: Whether the same policy instrument is legally binding affects its effectiveness, which is


also the core difference between hard law and soft law. The concept of soft law was originally common in the EU and international law54. At present, countries need to consider multiple goals


while mitigating climate change. Therefore, soft law has become a flexible choice when making climate commitments cautiously. Both soft and hard laws have advantages and disadvantages. Hard


law usually balances the interests of all parties and is more democratic but has a long promulgation cycle and high costs. Soft law is formulated and adopted more quickly, but at the same


time, there are problems of poor consideration and they can be overly ambitious55. In addition, the soft law is a good complement to the hard law, and its functions can be divided into


prelaw functions, postlaw functions, and para-law functions55. For soft law, we refer to the classification based on function (as an alternative to legislation) proposed by Senden, L55,


which divides soft law into three main categories: preparatory and informative instruments, interpretative and decisional instruments, and steering instruments. For hard laws, we mainly


consider the priority when two hard laws conflict, that is, the legal hierarchy56. We refer to the legal hierarchy summarized by Clegg, M. _et al_.56, combined with the legal hierarchy of


the European Union, the United States and other countries, and divide hard law into three main categories: constitution, statutes/legislation, and regulations/rules. The classification rules


and descriptions of soft law and hard law can be found in Supplementary Table 6, and some examples are shown in Supplementary Table 7. * EXECUTIVE/LEGISLATIVE: The enactment of policies by


executive or legislative bodies can partially measure the durability of policies (assuming that executive measures can more easily be removed by new governments, while laws are more durable,


given that their removal requires the approval of the legislative body), referring to Iacobuta _et al_.9 Policy durability has attracted attention9 because it affects the expectations of


policy implementation objects. For example, expectations regarding the durability of policies may affect short-term and long-term decision-making judgements of enterprises, thereby impacting


the effectiveness of policy implementation. What is slightly different is that, like the CCLW, the GCCMPD considers whether policies are enacted by the legislative or executive branch of


the government on the basis of law or strategy. For example, China’s 14th five-year plan is classified as executive in the CCLW and GCCMPD but not legislative. * JURISDICTION: In addition to


classifying the scope of policy from an industrial perspective, such as sector, the classification of policy jurisdiction can help to better analyse the interaction between central and


local governments57, international competition and multilateral cooperation58, and regional emissions governance59. Therefore, we divided policy jurisdictions into international, national,


subnational area (the broader category with multiple provinces), and subnational (single state/provinces, municipality/city) jurisdictions. DATA COLLECTION Some challenges of data collection


are that there are few websites that can be directly packaged and downloaded, the webpage rules of each data source are different, and the rules are constantly updated. First, important


policy features were manually selected based on the characteristics of each website (all policies include at least title, content, and year). Then, Python toolkits such as Requests (an


elegant and simple library allows you to send HTTP requests) and Selenium (is used to automate web browser interaction from Python) were used to obtain the data. All the raw data are stored


in document form (CSV, EXCEL) and data platform form (MongoDB). DATA PROCESSING Through steps 1-12 of the data processing shown in Fig. 1, we constructed a reference (original) version of


the training set data as of December 31, 2021. To ensure the consistency of manual annotation and reduce manual annotation errors, the classification information of the IEA, CP, and CCLW


datasets was used as much as possible. As shown in Supplementary Table 2, among the main indicators of the GCCMPD, the preliminary classifications of sector, instrument, objective (including


detailed classification of subsector, sector-instrument and subobjective), and jurisdiction can refer to the IEA, CP, and CCLW classification information. The two indicators of binding


force and executive/legislative can only partially refer to the IEA, CP, and CCLW classification information. Therefore, additional legal keywords were constructed with reference to legal


dictionaries, such as Black’s law dictionary (https://thelawdictionary.org/) and Duhaime’s law dictionary (http://www.duhaime.org/), to assist in classification. Therefore, these steps were


intuitively divided into three categories: IEA, CP, and CCLW classification information (steps 2, 4, 6, and 11); IEA, CP, and CCLW information (steps 7, 8, and 9); and simple rules and


mappings (remaining steps). A detailed description of the processing steps can be found in the Supplementary Information (SI2); the mapping dictionary built based on IEA, CP, and CCLW can be


found in Supplementary Table 8-16; and the keywords built with reference to the legal dictionaries can be found in Supplementary Table 17-18. Although there are differences in the


definitions of mitigation and adaptation, thus, the two are strongly related60. Many climate policies often involve both mitigation and adaptation policies, and different datasets differ in


terms of whether the same climate policy involves mitigation or adaptation. The GCCMPD adopts a negative list method to construct keywords (Supplementary Table 19) that significantly


characterizes adaptation policies and remove them (One of the keywords in Supplementary Table 19 appears in the policy title or content, then the policy is judged to be climate adaptation.)


from the dataset. Below are the descriptive statistics of the training set. The training set contains climate policy data from 199 entities (198 countries and the European Union). Table 2


illustrates the distribution of entities and policies on each continent. According to Table 2, the number of African, Asian, and European entities is the largest. For the number of policies,


although the number of European entities is not the largest, European entities issued the most policies among the six continents. The average number of policies issued by each entity is the


highest in Europe (89.96) and lowest in Africa (16.09). There are 10088 policies ranging from 1927 to 2021 (among them, several policies in the IEA dataset are scheduled to enter into force


in 2025) included in our core dataset. As Table 3 shows, national climate policies make up approximately 90% of our sample, while international policies account for 3% and subnational


policies for 7%. We divided the policies into five sectors and found that most of the policies targeted energy systems (3871) and that the fewest policies focused on AFOLU (838). There are


also 3843 policies focused on the whole economic level rather than on a single sector or several other sectors. Policies are divided into seven types according to the instrument they apply.


Most policies employ the government provision of public goods or services (7040) and regulatory approaches (4467), while only 121 policies use the tradable allowance instrument. For the


binding force of the policies, soft laws constitute a greater share of the sample (69.73%), while hard laws constitute only 30.27% of the sample. Law/Act is the most popular hard law, while


Preparatory Instruments is the most widely used soft law. The classification of co-benefit objectives is mainly economic objective, followed by social objective. As Table 4 shows, by


combining these three datasets, the number of policies in each sector has almost doubled. All four datasets have a high share of energy sector policies. In addition, the integration of the


construction sector and the AFOLU sector shows the benefits of coordination. The IEA dataset also has a high proportion of construction sector policies, while the CCLW dataset has a


relatively high proportion of AFOLU sector policies. GCCMPD generates complementary advantages through the integration of these datasets, further highlighting the significant potential for


complementarity35 between the IEA, CP and CCLW. MANUAL CHECK, VERIFICATION, ANNOTATION CHECK FOR DUPLICATES Although merging the three datasets can combine their respective strengths, there


will be some duplication of policies, as the three datasets share many of the same sources. Duplicate policies do not affect analysis, such as policy coverage9,11,12 and policy


sequencing18,21, but do affect analysis based on policy strength23,24. Based on the Best Match 25 algorithm (BM2561, a ranking algorithm based on probabilistic relevance framework for


generating the similarity score between two texts), we used all policy titles as a corpus, grouped the data by country, year, and jurisdiction, and calculated the text similarity score


between policies. Human checks combined with text similarity scores eventually revealed duplicate policies (Supplementary Fig. 3). VERIFICATION AND ANNOTATION It is also necessary to


manually check, verify and annotate each indicator. Based on the reference version, the manual labelling bias was reduced. During manual verification, we also found that there were some


policy instruments and sectors that cannot be found directly from the policy content, and the reference version was used to assist the manual work in making more accurate annotations. After


manual checking and correcting the dictionary mapping several times, we found that even graduate students majoring in energy and climate were prone to label mistakes. Here are a few


illustrations of how to manually check policy sectors. We judged whether the policies belonged to the energy system or building sector based on whether the home solar PV system was connected


to the grid because self-sufficiency alone cannot be considered the energy supply. Heating and cooling generally refer to household heating and cooling appliances (buildings) rather than


district heating (energy systems). Fossil fuel extraction (energy systems) depends mainly on how the carbon emissions dataset is classified. Standards for building parking spaces to promote


the use of new energy vehicles are categorized as transport and building policies. Equipment safety and photovoltaics are considered industry policies. Biofuels are considered transport


policies, biomass power generation is considered for energy systems, and the manufacture of biofuels may involve industry and AFOLU. It should be noted that climate policy data differ from


carbon emissions data. It is very common for a policy to involve multiple sectors. For the instrument indicator, the tax category is relatively easy to mislabel. To distinguish it from


subsidies, new taxes and tax increases are classified as taxes, and tax reductions and accelerated depreciation are classified as subsidies. Since the tax classification of the IEA is based


on whether a certain type of tax appears in the policy text, the tax reduction will also be recorded as a tax, so it needs to be manually corrected. In addition, grants, awards, R&D


subsidies, etc., also need to be classified as subsidies or/and government provisions of public goods or services according to the content of the policy. Since the government provision of


public goods or services involves capacity building and the removal of barriers, it includes the establishment of institutions, the provision of financial support, etc. Another thing to


point out is that subsectors (sector-instrument and subobjective) are more detailed but not completely classified; that is, this does not mean that a policy classified as a certain sector


(instrument or objective) will necessarily be further classified into a certain subsector (sector-instrument and subobjective) (Supplementary Figs. 4-6). For example, some energy plans


mentioned several sectors but did not involve specific subsectors. Some government funds support renewable energy power generation without specifying whether it is investment, R&D


support, or subsidies. SEARCH, VERIFICATION, AND SUPPLEMENTATION OF POLICY CONTENT The policy content also requires many manual searches and verification. According to Table 5, many policy


contents in the CP and IEA are missing or too short, and this will affect the effectiveness of manual checking indicators, machine learning model training, and topic models. We tried to find


the content of the policy by searching the source webpage. For some countries whose official language is English (e.g., Canada), the source web page was invalid or incorrect, and for


countries whose official language is not English (e.g., Argentina, Spain, Germany, Japan, etc.), there were language problems. We used Google Search (when the source web page was invalid or


incorrect) in addition to optical character recognition (OCR) technology and Google Translate for scanned documents and language issues. Through the above efforts, the missing value of


policy content has been greatly reduced, and the quality of policy content has improved (Table 5). In addition, manual verification also included translating the policy title and content


into English and correcting the policy year and jurisdiction. DATASET EXPANSION After the above data processing and manual inspection, a refined training dataset containing 10088 policies


was formed (GCCMPD-IEA-CP-CCLW). However, similar to other datasets, the GCCMPD-IEA-CP-CCLW still had shortcomings, such as high labour costs, slow update speed, and difficulty expanding


(some data sources do not have relevant indicator information and cannot form dictionary maps). In this section, we used several state-of-the-art natural language processing (NLP) and


machine learning techniques to expand the dataset. As shown in Table 6, through the application of machine translation, multilabel classification, single-label classification, named entity


recognition, text similarity, and topic models, dataset construction was completed only by relying on policy titles and content. MULTILABEL AND SINGLE-LABEL CLASSIFICATION The GCCMPD’s


indicators, sector, instrument, and objective are multilabel classifications, while binding force and executive/legislative are single-label classifications. We chose the ClimateBERT62


algorithm, which is based on the Bidirectional Encoder Representations from Transformers (BERT) model, because it has high performance63,64, relatively low fine-tuning cost compared with


large language models such as Generative pretrained transformer (GPT) or Large Language Model Meta AI (Llama), and concentration on climate change (pretrained on the text corpus of abstracts


of climate-related research papers, corporate and general news, and company reports, and because it is currently the latest model in the field of climate change), and it is widely used in


existing studies65,66. By comparing ClimateBERT and the traditional machine learning models logistic regression classifier (LR), naïve Bayes (NB), and support vector machine (SVM) with


ClimateBERT, the results (Supplementary Table 21–25) demonstrate the advantages of ClimateBERT in the climate field. The detailed model training details and comparisons can be found in the


Supplementary Information (SI3). NAMED ENTITY RECOGNITION (NER) Judging climate policy jurisdictions by identifying states/provinces and countries that appear in policies through NER is an


intuitive and highly interpretable method. Specifically, we first used the Roberta-based67 “en_core_web_trf” model to identify geopolitical entities and then use “countryinfo” to distinguish


“state/province” or “country”. Furthermore, the “Subnational area” and “International” were determined according to the number of different cities and countries in a policy (Supplementary


Table 26). TEXT SIMILARITY In the GCCMPD-IEA-CP-CCLW, after calculating the BM25 score, we determined the duplication of policies from different sources by manual checking in groups.


However, for data expansion, two issues needed to be considered: 1. When the amount of data increases rapidly, methods based on a large amount of labour are no longer applicable. 2. How can


we objectively judge whether two policies are the same? Some text similarity methods, such as bag-of-words/TFIDF combined with cosine similarity, can obtain a standardized value, but


thresholds such as 0.8 still need to be set subjectively, and the methods are relatively rough. We still used the BM25 algorithm and combined the ideas of model training and evaluation to


automatically find duplicates (Fig. 2). Finally, based on the optimal ranking of 6061 and an F1 score of 0.8 (Supplementary Fig. 7), the optimal BM25 score was determined. A policy with a


similarity score greater than the optimal BM25 score was judged to be a duplicate policy. The technical details and detailed instructions can be found in the Supplementary Information (SI2).


TOPIC MODEL Policy topic analysis can aid in understanding policy evolution17 and has become a new branch of research in the field of public policy. Currently, BERTopic68 is a


state-of-the-art topic model that effectively uses word embedding to transform input documents into numerical representations, dimensionality reduction to reduce the dimensionality of the


embeddings to a workable dimensional space and clustering similar embeddings to obtain topic representations through c-TF-IDF (an adjusted TF-IDF to work on cluster/categorical/topic level


instead of document level). The GCCMPD provides 4 topic models for different analysis needs, namely, GCCMPD-IEA-CP-CCLW-Topic, GCCMPD-EXPAND-Topic (counting multilateral policies as


multiple), GCCMPD-Topic (counting multilateral policies as one) and GCCMPD-EXCEPT-ECOLEX-Topic (excluding ECOLEX). The specific model details can be found in the Supplementary Information


(SI2). DATA RECORDS To facilitate the use of various scientific researchers, government staff, and personnel of international organizations, the GCCMPD provides three forms of data storage:


MySQL, MongoDB, and EXCEL. All data records have been uploaded through Figshare: https://doi.org/10.6084/m9.figshare.22590028.v269. The relational data management system (MySQL) can well


reflect the composition of core data (Fig. 3): * POLICY: contains all deduplicated climate mitigation policies (policy ID, policy title and content translated into English, policy source). *


ADDITIONAL INFORMATION: includes the year the policy was published, the original title and content of the policy, and the policy source document web page. * CHARACTERISTIC: contains


indicators, sector, objective, instrument, executive/legislative, and binding force, for each policy. * SIMILAR INFORMATION: provides the total number of policies in each policy group, the


index, the BM25 score, and the title of the most similar policy in the group. * COUNTRY AND JURISDICTION: provides information about the country where the policy was adopted. The table is


also linked to external countries’ economic (e.g., GDP, imports and exports), social (e.g., population, democracy), political (e.g., left-wing or right-wing, federal), and energy (e.g.,


primary energy usage) and characteristic dataset media. * TOPIC. provides the topic category of each policy, topic name, the most representative words under this topic, the probability that


the policy belongs to this topic, and whether it is a representative policy of this topic. In addition to the core data, the GCCMPD also retains the processing results of each step.


Supplementary Table 27 lists some important result files and their descriptions. These files are all stored in EXCEL. TECHNICAL VALIDATION The data quality of the GCCMPD is affected by


several factors. The first is the data source. The comprehensiveness of information from different data sources varies greatly, which directly affects the model training ability. The second


is the reliability of the annotation and training sets and the training ability of the model. Since the construction idea of the GCCMPD is to first form a training set through manual


labelling and then extrapolate to a richer data source, the quality of the annotation and training set is highly important. We took the following steps to enhance the data quality: * MANUAL


DATA RETRIEVAL: The determination of data sources was performed through high-quality literature, Google searches, and data sources from authoritative datasets. The requirements for the data


sources were as follows: 1. Include the title and content of the policy, and there should not be too many missing values; 2. Include the year of the policy; 3. The entity information of the


policy release is contained as much as possible. * MANUAL ANNOTATIONS AND CHECKS: We compared the annotations of PhD students in multiple climate and energy fields (with each person


annotating a small number of fields) and found that the differences in their annotations were concentrated in several places (Manual check, verification, annotation section). Although we had


given classification criteria and explained where it is easy to misclassify, manual annotation was inevitably subjective. Fortunately, objectivity and consistency can be guaranteed through


dictionary mapping. It can be considered that IEA, CP, and CCLW data are the results of authoritative experts. We regard them as real values and the GCCMPD data as predicted values and


compare the results (It is called comparison because the prediction result is not based on the model, but only draws on the metrics to give a more intuitive comparison result) of the two


(Tables 7–11, for the comparison of subsectors, sector instruments, and subobjectives. See Supplementary Table 28-30 for additional information). The resulting recall rates were generally


high, even close to 1, which means that we fully refer to the results of dictionary mapping. Some categories with lower recall results (such as Other Strategy Plan or Target) indicate that


we later focused on checking such results. Some results with low precision rates were mostly because the original policy data source did not have corresponding keyword labels, and we


annotated them during manual verification and inspection. The precision rate of the taxes category was not high because we checked and found that the classification standard of the IEA


dataset was different from that of the GCCMPD dataset, while the precision rate of the subsidies category was not high because keywords such as “Award” and “Grants” cannot determine whether


they are subsidies. Therefore, we did not construct a dictionary for relevant keywords and instead we added annotations through manual inspection. For the manual inspection of duplicate


data, we also tried to be as detailed as possible and record the specific corresponding relationship of policy duplication. * ENHANCED MODEL TRAINING CAPABILITIES: Based on the use of


state-of-the-art models, we improved model learning by translating non-English policies into English and checking and completing policy content. In terms of manually checking policy content,


we selected content that could be used to summarize the policy, such as goals, purposes, and objectives, as well as information that could be used to assist in judging the legal category.


After the GCCMPD data are made public, we will also enhance the data sources and correct errors through interactions with data users. Nonetheless, our dataset still has flaws, one of which


is that our dataset does not contain adaptation policies. Climate adaptation is a very important field70, especially for countries with high climate vulnerability and high mitigation costs.


Future research can improve our study by constructing an adaptation policy dataset. CODE AVAILABILITY Code, dataset and some intermediate results are freely available on the following GitHub


repository: https://github.com/HUANGZHIHAO1994/GCCMPD-climate-policy-dataset. REFERENCES * Zhu, J., Ge, Z., Wang, J., Li, X. & Wang, C. Evaluating regional carbon emissions trading in


China: effects, pathways, co-benefits, spillovers, and prospects. _Climate Policy_ 22, 918–934 (2022). Article  Google Scholar  * Zhu, J., Fan, Y., Deng, X. & Xue, L. Low-carbon


innovation induced by emissions trading in China. _Nat Commun_ 10, 4088 (2019). Article  ADS  PubMed  PubMed Central  Google Scholar  * Cui, J., Wang, C., Zhang, J. & Zheng, Y. The


effectiveness of China’s regional carbon market pilots in reducing firm emissions. _Proc. Natl. Acad. Sci. USA_ 118, e2109912118 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar


  * Yamazaki, A. Jobs and climate policy: Evidence from British Columbia’s revenue-neutral carbon tax. _Journal of Environmental Economics and Management_ 83, 197–216 (2017). Article  Google


Scholar  * Maestre-Andrés, S., Drews, S., Savin, I. & van den Bergh, J. Carbon tax acceptability with information provision and mixed revenue uses. _Nat Commun_ 12, 7017 (2021). Article


  ADS  PubMed  PubMed Central  Google Scholar  * Liski, M. & Tahvonen, O. Can carbon tax eat OPEC’s rents? _Journal of Environmental Economics and Management_ 47, 1–12 (2004). Article 


Google Scholar  * Demailly, D. & Quirion, P. European Emission Trading Scheme and competitiveness: A case study on the iron and steel industry. _Energy Economics_ 30, 2009–2027 (2008).


Article  Google Scholar  * Nong, D., Simshauser, P. & Nguyen, D. B. Greenhouse gas emissions vs CO2 emissions: Comparative analysis of a global carbon tax. _Applied Energy_ 298, 117223


(2021). Article  Google Scholar  * Iacobuta, G., Dubash, N. K., Upadhyaya, P., Deribe, M. & Höhne, N. National climate change mitigation legislation, strategy and targets: a global


update. _Climate Policy_ 18, 1114–1132 (2018). Article  Google Scholar  * Lachapelle, E. & Paterson, M. Drivers of national climate policy. _Climate Policy_ 13, 547–571 (2013). Article 


Google Scholar  * Nascimento, L. _et al_. Twenty years of climate policy: G20 coverage and gaps. _Climate Policy_ 22, 158–174 (2022). Article  Google Scholar  * Dubash, N. K., Hagemann, M.,


Höhne, N. & Upadhyaya, P. Developments in national climate change mitigation legislation and strategy. _Climate Policy_ 13, 649–664 (2013). Article  Google Scholar  * Fankhauser, S.,


Gennaioli, C. & Collins, M. Do international factors influence the passage of climate change legislation? _Climate Policy_ 16, 318–331 (2016). Article  Google Scholar  * Fankhauser, S.,


Gennaioli, C. & Collins, M. The political economy of passing climate change legislation: Evidence from a survey. _Global Environmental Change_ 35, 52–61 (2015). Article  Google Scholar 


* Tews, K., Busch, P.-O. & Jorgens, H. The diffusion of new environmental policy instruments1. _Eur J Political Res_ 42, 569–600 (2003). Article  Google Scholar  * Busch, P. &


Jörgens, H. The international sources of policy convergence: explaining the spread of environmental policy innovations. _Journal of European Public Policy_ 12, 860–884 (2005). Article 


Google Scholar  * Biesbroek, R., Wright, S. J., Eguren, S. K., Bonotto, A. & Athanasiadis, I. N. Policy attention to climate change impacts, adaptation and vulnerability: a global


assessment of National Communications (1994–2019). _Climate Policy_ 22, 97–111 (2022). Article  Google Scholar  * Linsenmeier, M., Mohommad, A. & Schwerhoff, G. Policy sequencing towards


carbon pricing among the world’s largest emitters. _Nat. Clim. Chang._ 12, 1107–1110 (2022). Article  ADS  CAS  Google Scholar  * Averchenkova, A., Fankhauser, S. & Nachmany, M. _Trends


in Climate Change Legislation_. (Edward Elgar Publishing, Cheltenham, UK; Northampton, MA, USA, 2017). * Meckling, J. & Allan, B. B. The evolution of ideas in global climate policy.


_Nat. Clim. Chang._ 10, 434–438 (2020). Article  ADS  Google Scholar  * Meckling, J., Sterner, T. & Wagner, G. Policy sequencing toward decarbonization. _Nat Energy_ 2, 918–922 (2017).


Article  ADS  Google Scholar  * Le Quéré, C. _et al_. Drivers of declining CO2 emissions in 18 developed economies. _Nat. Clim. Chang._ 9, 213–217 (2019). Article  ADS  Google Scholar  *


Eskander, S. M. S. U. & Fankhauser, S. Reduction in greenhouse gas emissions from national climate legislation. _Nature Climate Change_ 10, 750–756 (2020). Article  ADS  CAS  Google


Scholar  * Chen, P. _et al_. The heterogeneous role of energy policies in the energy transition of Asia–Pacific emerging economies. _Nature Energy_ 7, 588–596 (2022). Article  ADS  Google


Scholar  * Lamb, W. F. & Minx, J. C. The political economy of national climate policy: Architectures of constraint and a typology of countries. _Energy Research & Social Science_ 64,


101429 (2020). Article  Google Scholar  * Peñasco, C., Anadón, L. D. & Verdolini, E. Systematic review of the outcomes and trade-offs of ten types of decarbonization policy instruments.


_Nat. Clim. Chang._ 11, 257–265 (2021). Article  ADS  Google Scholar  * Rogge, K. S. & Reichardt, K. Policy mixes for sustainability transitions: An extended concept and framework for


analysis. _Research Policy_ 45, 1620–1635 (2016). Article  Google Scholar  * van den Bergh, J. _et al_. Designing an effective climate-policy mix: accounting for instrument synergy. _Climate


Policy_ 21, 745–764 (2021). Article  Google Scholar  * Schmidt, T. S. & Sewerin, S. Measuring the temporal dynamics of policy mixes – An empirical analysis of renewable energy policy


mixes’ balance and design features in nine countries. _Research Policy_ 48, 103557 (2019). Article  Google Scholar  * Schmidt, N. M. & Fleig, A. Global patterns of national climate


policies: Analyzing 171 country portfolios on climate policy integration. _Environmental Science & Policy_ 84, 177–185 (2018). Article  Google Scholar  * Fankhauser, S., Hepburn, C.


& Park, J. Combining multiple climate policy instruments: how not to do it. _Clim. Change Econ._ 01, 209–225 (2010). Article  Google Scholar  * Viguié, V. & Hallegatte, S. Trade-offs


and synergies in urban climate policies. _Nature Clim Change_ 2, 334–337 (2012). Article  ADS  Google Scholar  * Persha, L., Agrawal, A. & Chhatre, A. Social and Ecological Synergy:


Local Rulemaking, Forest Livelihoods, and Biodiversity Conservation. _Science_ 331, 1606–1608 (2011). Article  ADS  CAS  PubMed  Google Scholar  * Bryan, B. A. _et al_. Designer policy for


carbon and biodiversity co-benefits under global change. _Nature Clim Change_ 6, 301–305 (2016). Article  ADS  Google Scholar  * Schaub, S., Tosun, J., Jordan, A. & Enguer, J. Climate


Policy Ambition: Exploring A Policy Density Perspective. _Politics and Governance_ 10, 226–238 (2022). Article  Google Scholar  * Nascimento, L. & Höhne, N. Expanding climate policy


adoption improves national mitigation efforts. _npj Clim. Action_ 2, 12 (2023). Article  Google Scholar  * Schaffrin, A., Sewerin, S. & Seubert, S. Toward a Comparative Measure of


Climate Policy Output. _Policy Studies Journal_ 43, 257–282 (2015). Article  Google Scholar  * CLÒ, S. The effectiveness of the EU Emissions Trading Scheme. _Climate Policy_ 9, 227–241


(2009). Article  Google Scholar  * Sandoff, A. & Schaad, G. Does EU ETS lead to emission reductions through trade? The case of the Swedish emissions trading sector participants. _Energy


Policy_ 37, 3967–3977 (2009). Article  Google Scholar  * Castro, P. Does the CDM discourage emission reduction targets in advanced developing countries? _Climate Policy_ 12, 198–218 (2012).


Article  Google Scholar  * Lin, B. & Li, X. The effect of carbon tax on per capita CO2 emissions. _Energy Policy_ 39, 5137–5146 (2011). Article  Google Scholar  * Desmarais, B. A.,


Harden, J. J. & Boehmke, F. J. Persistent Policy Pathways: Inferring Diffusion Networks in the American States. _American Political Science Review_ 109, 392–406 (2015). Article  Google


Scholar  * Lawrence, P. & Wong, D. Soft law in the Paris Climate Agreement: Strength or weakness? _Review of European_. _Comparative & International Environmental Law_ 26, 276–286


(2017). Article  Google Scholar  * Vihma, A. Analyzing Soft Law and Hard Law in Climate Change. in _Climate Change and the Law_ (eds. Hollo, E. J., Kulovesi, K. & Mehling, M.) 143–164


(Springer Netherlands, Dordrecht, 2013). https://doi.org/10.1007/978-94-007-5440-9_7. * Thompson, T. M., Rausch, S., Saari, R. K. & Selin, N. E. A systems approach to evaluating the air


quality co-benefits of US carbon policies. _Nature Clim Change_ 4, 917–923 (2014). Article  ADS  Google Scholar  * Gütschow, J. _et al_. The PRIMAP-hist national historical emissions time


series. _Earth Syst. Sci. Data_ 8, 571–603 (2016). Article  ADS  Google Scholar  * Jeffery, M. L., Gütschow, J., Gieseke, R. & Gebel, R. PRIMAP-crf: UNFCCC CRF data in IPCC 2006


categories. _Earth Syst. Sci. Data_ 10, 1427–1438 (2018). Article  ADS  Google Scholar  * Eggleston, H. S, Buendia, L, Miwa, K, Ngara, T. & Tanabe, K. 2006 IPCC Guidelines for National


Greenhouse Gas Inventories. (2006). * Lamb, W. F. _et al_. A review of trends and drivers of greenhouse gas emissions by sector from 1990 to 2018. _Environ. Res. Lett._ 16, 073005 (2021).


Article  ADS  CAS  Google Scholar  * Minx, J. C. _et al_. A comprehensive and synthetic dataset for global, regional, and national greenhouse gas emissions by sector 1970–2018 with an


extension to 2019. _Earth Syst. Sci. Data_ 13, 5213–5252, https://doi.org/10.5194/essd-13-5213-2021 (2021). * Hood, C., Margetts, H. & Hood, C. _The Tools of Government in the Digital


Age_. (Palgrave Macmillan, Basingstoke, 2007). * IPCC. _Climate Change 2014: Mitigation of Climate Change: Working Group III Contribution to the Fifth Assessment Report of the


Intergovernmental Panel on Climate Change_. (Cambridge University Press, New York, NY, 2014). * Metz, B., Davidson, O., Swart, R. & Pan, J. _Climate Change 2001: Mitigation: Contribution


of Working Group III to the Third Assessment Report of the Intergovernmental Panel on Climate Change_. (Cambridge university press, 2001). * Robilant, A. D. Genealogies of Soft Law. _The


American Journal of Comparative Law_ 54, 499–554 (2006). Article  Google Scholar  * Senden, L. _Soft Law in European Community Law_. vol. 1 (Hart publishing, 2004). * Clegg, M., Ellena, K.,


Ennis, D. & Vickery, C. _The hierarchy of laws: understanding and implementing the legal frameworks that govern election_. International Foundation for Electoral Systems, Arlington, VA


(2016). * Di Gregorio, M. _et al_. Multi-level governance and power in climate change policy networks. _Global Environmental Change_ 54, 64–77 (2019). Article  Google Scholar  * Doukas, H.,


Karakosta, C. & Psarras, J. RES technology transfer within the new climate regime: A “helicopter” view under the CDM. _Renewable and Sustainable Energy Reviews_ 13, 1138–1143 (2009).


Article  Google Scholar  * Bulkeley, H. Cities and the Governing of Climate Change. _Annual Review of Environment and Resources_ 35, 229–253 (2010). Article  Google Scholar  * Parry, M. L.,


Canziani, O., Palutikof, J., Van der Linden, P. & Hanson, C. _Climate Change 2007-Impacts, Adaptation and Vulnerability: Working Group II Contribution to the Fourth Assessment Report of


the IPCC_. vol. 4 (Cambridge University Press, 2007). * Robertson, S. & Zaragoza, H. The Probabilistic Relevance Framework: BM25 and Beyond. _FNT in Information Retrieval_ 3, 333–389


(2009). Article  Google Scholar  * Webersinke, N., Kraus, M., Bingler, J. & Leippold, M. ClimateBERT: A Pretrained Language Model for Climate-Related Text. _arXiv preprint


arXiv:2110.12010_ (2021). * Gök, A., Antai, R., Milošević, N. & Al-Nabki, W. Building the European Social Innovation Database with Natural Language Processing and Machine Learning. _Sci


Data_ 9, 697 (2022). Article  PubMed  PubMed Central  Google Scholar  * Minaee, S. _et al_. Deep learning–based text classification: a comprehensive review. _ACM computing surveys (CSUR)_


54, 1–40 (2021). Article  Google Scholar  * Kölbel, J. F., Leippold, M., Rillaerts, J. & Wang, Q. Ask BERT: How Regulatory Disclosure of Transition and Physical Climate Risks Affects the


CDS Term Structure*. _Journal of Financial Econometrics_ 22, 30–69 (2024). Article  Google Scholar  * Bingler, J. A., Kraus, M., Leippold, M. & Webersinke, N. Cheap talk and


cherry-picking: What ClimateBert has to say on corporate climate risk disclosures. _Finance Research Letters_ 47, 102776 (2022). Article  Google Scholar  * Liu, Y. _et al_. Roberta: A


robustly optimized bert pretraining approach. _arXiv preprint arXiv:1907.11692_ (2019). * Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. _arXiv


preprint arXiv:2203.05794_ (2022). * Huang, Z., Wu, L., Zhang, X. & Wang, Y. Global Climate Change Mitigation Policy Database. _Figshare_ https://doi.org/10.6084/m9.figshare.22590028.v2


(2023). * Lesnikowski, A., Ford, J., Biesbroek, R., Berrang-Ford, L. & Heymann, S. J. National-level progress on adaptation. _Nature Clim Change_ 6, 261–264 (2016). Article  ADS  Google


Scholar  Download references ACKNOWLEDGEMENTS This study acknowledges financial support from the National Natural Science Foundation of China (71925010). AUTHOR INFORMATION AUTHORS AND


AFFILIATIONS * School of Data Science, Fudan University, Shanghai, 200433, China Libo Wu, Zhihao Huang, Xing Zhang & Yushi Wang * Institute for Big Data, Fudan University, Shanghai,


200433, China Libo Wu * School of Economics, Fudan University, Shanghai, 200433, China Libo Wu * Shanghai Institute for Energy and Carbon Neutrality Strategy, Fudan University, Shanghai,


200433, China Libo Wu Authors * Libo Wu View author publications You can also search for this author inPubMed Google Scholar * Zhihao Huang View author publications You can also search for


this author inPubMed Google Scholar * Xing Zhang View author publications You can also search for this author inPubMed Google Scholar * Yushi Wang View author publications You can also


search for this author inPubMed Google Scholar CONTRIBUTIONS L.W. conceived the study. Z.H. performed analysis. All authors (Z.H., Y.W., X.Z. and L.W.) interpreted the data. Z.H. prepared


the manuscript. Z.H., Y.W. and L.W. revised the manuscript. CORRESPONDING AUTHOR Correspondence to Libo Wu. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing


interests. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. SUPPLEMENTARY


INFORMATION SUPPLEMENTARY FIGURES RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing,


adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons


licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a


credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted


use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT


THIS ARTICLE CITE THIS ARTICLE Wu, L., Huang, Z., Zhang, X. _et al._ Harmonizing existing climate change mitigation policy datasets with a hybrid machine learning approach. _Sci Data_ 11,


580 (2024). https://doi.org/10.1038/s41597-024-03411-z Download citation * Received: 01 November 2023 * Accepted: 23 May 2024 * Published: 04 June 2024 * DOI:


https://doi.org/10.1038/s41597-024-03411-z SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not


currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative