Comparing country policies in support of public research using the STIP Compass dataset¶

By Gildas Ehrmann https://www.linkedin.com/in/gildasehrmann/

1. Introduction: Motivation and scope of this work¶

The 2019 edition of the STIP Compass database provides a wealth of information on more than 5000 policy initiatives for 51 countries and the European Union. However, the portal’s current interactive dashboards do not allow to easily compare policy choices across countries. There is nevertheless appetite from policy makers, from policy analysts to get country overviews in specific policy areas, leveraging on that rich dataset, and to have the ability to compare countries against one anothers.

To address this need, this notebooks aims to pilot visualisations of policy initiative data for five selected OECD Member Countries (Canada, France, Germany, Japan and Korea). These countries are selected as the quality of their data is relatively higher compared to that of others in the database. Using the 2019 STIP Compass dataset, this work develops code in Python that visualise detailed aspects about public research policies and related policy instruments. While the scope of this analysis only covers a fraction of the STI policy field, it could be replicated and extended to approach other policy areas covered by the STIP Compass database (e.g. business innovation, technology transfer).

This notebook is structured as follows. Section 2 describes how the STIP Compass dataset is downloaded, pre-processed and made ready for analysis. Section 3 provides an overview of policies reported in the “public research system” policy area of the dataset across the five selected countries. Section 4 focuses on specific policy instruments in support of public research, namely institutional funding for public research, project grants for public research and dedicated support to research infrastructure. Lastly, Section 5 reflects on the outcomes and limitations of this pilot, together with possible next steps.

2. Preparing the STIP Compass dataset¶

This notebook works directly using the full 2019 STIP Compass database published by the STIP Data Lab. This first section retrieve the dataset from the web and conducts a few preliminary operations., In particular, the relevant data is trimmed, arranged and formatted. It produces an extract of essential columns and introduces a few variables used throughout the analysis.

2.1. Loading the dataset¶

A dictionary (Dict_Headers) is created from the second row of the dataset to store and save headers for later use. Some corrections are made to reduce length of field "Long Description". On the dataset downloaded, corrections are made on variables including cleaning the fields "InitiativeID", "InstrumentID", aggregating the CountryLabel/CountryCode for Belgian reporting entities. All variables are transformed into numerical variables when appropriate. An extra field is added to store countries' OECD status (Member or Partner). All the variables created throughout the notebook will need to be added to the dictionary (Dict_Headers). Hence a large amount of variables named in the format "isbudgetedxxx" are declared in the dictionary. These will be defined later and used in the color shading process of graphs but need to be declared in the dictionary for simplicity.

2.2. Procedure to measure initiatives and instruments weights¶

The STIP Compass dataset maps budgetary data for policy initiatives. More specifically, it indicates on their estimated yearly budget expenditures using a series of ranges (Table 1). To consider the scale of an initiative or an instrument, the net budget weight (NBW) is computed depending on each initiative budget range, as per Table 1. As with the rest of the STIP Compass policy data, the budget range used to compute NBW is reported by countries; when missing, it takes a value of 0. The NBW number of initiatives is obtained by adding 1 to the budget weight coefficient. In this notebook, most calculations count initiatives using the NBW value, which would accordingly range from 1 to 6.

Table 1. Budget ranges and their associated weight coefficients

Budget Range (in Million Euros)	Budget weight coefficient	Net budget-weighted initiatives
Don’t know / Not applicable	0 (no weight)	1
Less than 1 M	0.005	1.005
1 M – 5 M	0.01	1.01
5 M – 20 M	0.05	1.05
20 M – 50 M	0.2	1.2
50 M – 100 M	0.5	1.5
100 M – 500 M	1	2
More than 500 M	5	6

2.3. Selecting countries¶

The countries to include in the analysis are stored in a dataframe (my_countries). Country selection can be changed at will from the list of 38 OECD Member Countriess and 20+ countries and territories participating in STIP Compass data, although the degree and quality of reporting varies across countries. In this pilot notebook, 5 OECD member countries were selected: Canada, France, Germany, Japan and Korea.

These preliminary steps mentioned above are conducted in the code cell below (click to expand/collapse).

# Import necessary libraries:

import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Import the dataset

# Note: load online stip compass data using pipe '|' separator and skipping the second header (multi-indexing causes problems in the filtering)
#url = 'https://stip.oecd.org/assets/downloads/STIP_Survey.csv'
#compass_df = pd.read_csv(url, sep='|', skiprows=[1])

compass_df = pd.read_csv(r'C:\Users\Gildas\OneDrive\MSC SUSDEV\[OECD]\STIP Introduction\research precariat\STIP_Survey.csv', low_memory=False)

# Prepare the dataset: Trim data/ Correct issues in format / Create dictionnaries
    # Create a dictionary of column codes with column headers

Dict_Headers = compass_df.head(1).T.reset_index()
Dict_Headers.loc[1] = ['Code', 'Long Description']
new_header = Dict_Headers.iloc[1]
Dict_Headers = Dict_Headers[2:] 
Dict_Headers.columns = new_header
Dict_Headers.reset_index(inplace=True, drop=False)
Dict_Headers.drop(['index'], axis=1, inplace=True)

    #Dict_Headers['Long Description'].replace(r':', r':<br>')
Dict_Headers

    # add additional data to dictionnary
new_row2 = pd.DataFrame([['F41:V31', 'Funding amount allocated for an average time-period of:Not specified'], ['Unique', 'Unique'], ['OECD_STATUS', 'OECD_STATUS'], ['InstrumentID', 'InstrumentID'], ['NBW', 'NBW'], ['Total', 'Total'], ['isbudgeted', 'isbudgeted'], ['isbudgeted28', 'isbudgeted28'], ['isbudgeted29', 'isbudgeted29'], ['isbudgeted30', 'isbudgeted30'], ['isbudgeted31', 'isbudgeted31'], ['isbudgeted33', 'isbudgeted33'], ['isbudgeted34', 'isbudgeted34'], ['isbudgeted35', 'isbudgeted35'], ['isbudgeted804', 'isbudgeted804'],
                         ['isbudgeted805', 'isbudgeted805'], ['F78:V31', 'Not specified'], ['F76:V31', 'Not specified'], ['isbudgeted37', 'isbudgeted37'], ['isbudgeted38', 'isbudgeted38'], ['isbudgeted39', 'isbudgeted39'], ['isbudgeted40', 'isbudgeted40'], ['isbudgeted41', 'isbudgeted41'], ['isbudgeted806','isbudgeted806'],
                         ['isbudgeted856', 'isbudgeted856'], ['isbudgeted857','isbudgeted857'], ['isbudgeted858','isbudgeted858'], ['isbudgeted859','isbudgeted859'], ['isbudgeted860','isbudgeted860'], ['isbudgeted861','isbudgeted861'], ['isbudgeted862','isbudgeted862'],['isbudgeted863','isbudgeted863'],['F120:V863', 'Not provided']],columns=['Code', 'Long Description'])

    # Merge the two back dataframes together
Dict_Headers = Dict_Headers.append(new_row2, ignore_index=True)

    #Trim the dictionnary
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Maximum amount of grant awarded in euros:", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Funding amount allocated for an average time-period of:", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Performance-based element to the allocation:", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Criteria for funding:", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Type of activity:", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Requires a form of collaboration:", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Selection criteria:", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Share of public funding \(as a % of total funding of the centre of excellence\):", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Requires a form of collaborative research:", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Promotes international mobility of students and researchers:", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Promotes intersectoral mobility \(e.g. between the academic and private sectors\):", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Objective\(s\):", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(e.g. technology, innovation\)", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(e.g. contract research, other grants\)", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(to promote regional or cluster policy\)", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(excellence\)", "").str.strip()  
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Country name", "CountryLabel").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Country code", "CountryCode").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(e.g. contract research, competitive grants\)", "").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" intellectual property"," IP").str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(e.g. women and other under-represented groups\)", "").str.strip()

    # First sets of treatment on data:

compass_df['NameResponsibleOrganisation'].replace(r'', np.NaN)
compass_df.Tags.fillna("¬", inplace=True)
compass_df.YearlyBudgetRange.fillna("Don't know", inplace=True)
compass_df.drop(index=compass_df.index[0], 
        axis=0, 
        inplace=True)

    # There's a few organisations missing - for the corresponding initiatives to still be retrieved, replace NaN with 'Unknown organisation':

compass_df['NameResponsibleOrganisation'].fillna('Unknown organisation',inplace=True)

    # Shorten the initiatives ID, removing HTTP references
compass_df['InitiativeID'] = compass_df['InitiativeID'].str[-5:]
compass_df['InitiativeID'] = compass_df['InitiativeID'].str.replace("/", "").str.strip()

compass_df['InstrumentID'] = compass_df['InstrumentID'].str[-5:]
compass_df['InstrumentID'] = compass_df['InstrumentID'].str.replace("/", "").str.strip()

    # The case of Belgium: unify reporting entities under one country

compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium - Brussels Capital", "Belgium").str.strip()
compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium - Federal government", "Belgium").str.strip()
compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium - Flanders", "Belgium").str.strip()
compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium - Wallonia", "Belgium").str.strip()
compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium - Wallonia-Brussels Federation", "Belgium").str.strip()
compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium-Brussels Federation", "Belgium").str.strip()

compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEBRU", "BEL").str.strip()
compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEFED", "BEL").str.strip()
compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEVLG", "BEL").str.strip()
compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEWAL", "BEL").str.strip()
compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEWBF", "BEL").str.strip()
compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEWBF", "BEL").str.strip()

    # Ensure all the data is in the right format: 

for x in range(0, compass_df.shape[1]):
        compass_df.iloc[:,x] = compass_df.iloc[:,x].astype(int, errors='ignore')


# Complement the dataset:

    # Introduce a variable called PRS to filter PRS-specific data:

compass_df['PRS'] = pd.to_numeric(compass_df.TH10) + pd.to_numeric(compass_df.TH18) + pd.to_numeric(compass_df.TH19) + pd.to_numeric(compass_df.TH20) + pd.to_numeric(compass_df.TH21) + pd.to_numeric(compass_df.TH22) + pd.to_numeric(compass_df.TH23) + pd.to_numeric(compass_df.TH24) + pd.to_numeric(compass_df.TH25) + pd.to_numeric(compass_df.TH26) + pd.to_numeric(compass_df.TH27) + pd.to_numeric(compass_df.TH64)

    # Add column 'OECD_STATUS' on OECD membership vs partner

compass_df.loc[(compass_df['CountryCode'] == 'AUS') | (compass_df['CountryCode'] == 'AUT') | (compass_df['CountryCode'] == 'BEBRU') | (compass_df['CountryCode'] == 'BEFED') | (compass_df['CountryCode'] == 'BEVLG') | (compass_df['CountryCode'] == 'BEWAL') | (compass_df['CountryCode'] == 'BEWBF') | (compass_df['CountryCode'] == 'BEL') | (compass_df['CountryCode'] == 'CAN') | (compass_df['CountryCode'] == 'CHL') | (compass_df['CountryCode'] == 'COL') | (compass_df['CountryCode'] == 'CRI') | (compass_df['CountryCode'] == 'CZE') | (compass_df['CountryCode'] == 'DNK') | (compass_df['CountryCode'] == 'EST') | (compass_df['CountryCode'] == 'FIN') | (compass_df['CountryCode'] == 'FRA') | (compass_df['CountryCode'] == 'DEU') | (compass_df['CountryCode'] == 'GRC') | (compass_df['CountryCode'] == 'HUN') | (compass_df['CountryCode'] == 'ISL') | (compass_df['CountryCode'] == 'IRL') | (compass_df['CountryCode'] == 'ISR') | (compass_df['CountryCode'] == 'ITA') | (compass_df['CountryCode'] == 'JPN') | (compass_df['CountryCode'] == 'KOR') | (compass_df['CountryCode'] == 'LVA') | (compass_df['CountryCode'] == 'LTU') | (compass_df['CountryCode'] == 'LUX') | (compass_df['CountryCode'] == 'MEX') | (compass_df['CountryCode'] == 'NLD') | (compass_df['CountryCode'] == 'NZL') | (compass_df['CountryCode'] == 'NOR') | (compass_df['CountryCode'] == 'POL') | (compass_df['CountryCode'] == 'PRT') | (compass_df['CountryCode'] == 'SVK') | (compass_df['CountryCode'] == 'SVN') | (compass_df['CountryCode'] == 'ESP') | (compass_df['CountryCode'] == 'SWE') | (compass_df['CountryCode'] == 'CHE') | (compass_df['CountryCode'] == 'TUR') | (compass_df['CountryCode'] == 'GBR') | (compass_df['CountryCode'] == 'USA'), 'OECD_STATUS'] = 'Member'  
compass_df.loc[(compass_df['CountryCode'] != 'AUS') & (compass_df['CountryCode'] != 'AUT') & (compass_df['CountryCode'] != 'BEBRU') & (compass_df['CountryCode'] != 'BEFED') & (compass_df['CountryCode'] != 'BEVLG') & (compass_df['CountryCode'] != 'BEWAL') & (compass_df['CountryCode'] != 'BEWBF') & (compass_df['CountryCode'] != 'BEL') & (compass_df['CountryCode'] != 'CAN') & (compass_df['CountryCode'] != 'CHL') & (compass_df['CountryCode'] != 'COL') & (compass_df['CountryCode'] != 'CRI') & (compass_df['CountryCode'] != 'CZE') & (compass_df['CountryCode'] != 'DNK') & (compass_df['CountryCode'] != 'EST') & (compass_df['CountryCode'] != 'FIN') & (compass_df['CountryCode'] != 'FRA') & (compass_df['CountryCode'] != 'DEU') & (compass_df['CountryCode'] != 'GRC') & (compass_df['CountryCode'] != 'HUN') & (compass_df['CountryCode'] != 'ISL') & (compass_df['CountryCode'] != 'IRL') & (compass_df['CountryCode'] != 'ISR') & (compass_df['CountryCode'] != 'ITA') & (compass_df['CountryCode'] != 'JPN') & (compass_df['CountryCode'] != 'KOR') & (compass_df['CountryCode'] != 'LVA') & (compass_df['CountryCode'] != 'LTU') & (compass_df['CountryCode'] != 'LUX') & (compass_df['CountryCode'] != 'MEX') & (compass_df['CountryCode'] != 'NLD') & (compass_df['CountryCode'] != 'NZL') & (compass_df['CountryCode'] != 'NOR') & (compass_df['CountryCode'] != 'POL') & (compass_df['CountryCode'] != 'PRT') & (compass_df['CountryCode'] != 'SVK') & (compass_df['CountryCode'] != 'SVN') & (compass_df['CountryCode'] != 'ESP') & (compass_df['CountryCode'] != 'SWE') & (compass_df['CountryCode'] != 'CHE') & (compass_df['CountryCode'] != 'TUR') & (compass_df['CountryCode'] != 'GBR') & (compass_df['CountryCode'] != 'USA'), 'OECD_STATUS'] = 'Partner'  

    # Generate an order and assigne a coefficient for budget brackets:

Fundinggrid={'YearlyBudgetRange':["Don't know", 'Not applicable','Less than 1M','1M-5M','5M-20M','20M-50M','50M-100M','100M-500M','More than 500M'],'coefficient':[0,0,0.005,0.01,0.05,0.2,0.5,1,5],'order':[7,8,0,1,2,3,4,5,6],'isbudgeted':[0,1,1,1,1,1,1,1,1]}
Fundinggrid=pd.DataFrame(Fundinggrid)

    # Add the budget ordering to main dataframe and compute "NBW" (net-budget-weighted) for each datafrme entry 
compass_df = pd.merge(compass_df, 
                      Fundinggrid, 
                      on ='YearlyBudgetRange', 
                      how ='inner')
compass_df.rename(columns={'Unnamed: 0': 'OriginalID'}, inplace=True)
compass_df['NBW'] = compass_df['coefficient'] + 1

    # Filter out the columns we dont need:

compass_df = compass_df[['InitiativeID', 'OECD_STATUS', 'CountryLabel', 'CountryCode', 'StartDateYear', 'EndDateYear', 'NameResponsibleOrganisation', 'InstrumentTypeLabel', 'InstrumentID', 'YearlyBudgetRange', 'YearlyBudgetLocalCurrency', 'TG20', 'TG21', 'TG22', 'TG9', 'TG11', 'TG10', 'TG38', 'TG12', 'TG13', 'F38:V4', 'F38:V5', 'F40:V7', 'F40:V8', 'F40:V9', 'F40:V10', 'F40:V11', 'F40:V12', 'F40:V14', 'F40:V13', 'F40:V15', 'F40:V16', 'F40:V17', 'F40:V18', 'F40:V668', 'F40:V800', 'F40:V801', 'F40:V669', 'F41:V28', 'F41:V29', 'F41:V30', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805', 'F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V806', 'F77:V41', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50', 'F67:V145', 'F67:V146', 'F67:V147', 'F67:V148', 'F68:V150', 'F68:V839', 'F68:V151', 'F116:V823', 'F116:V824', 'F116:V825', 'F116:V826', 'F116:V827', 'F3:V687', 'F3:V830', 'F3:V90', 'F3:V91', 'F3:V92', 'F3:V93', 'F3:V94', 'F3:V688', 'F3:V95', 'F3:V96', 'F4:V98', 'F4:V99', 'F4:V100', 'F4:V101', 'F4:V689', 'F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'PRS', 'coefficient', 'NBW', 'isbudgeted']].copy()

# Country selection: What country should the analysis be on?

    # Select your countries (CountryLabel)

my_countries = ("Korea", "Japan", "Germany", "France", "Canada")
my_countries = pd.DataFrame(my_countries)
my_countries = my_countries.rename(columns = {0: "CountryLabel"})
my_countries = my_countries.sort_values(["CountryLabel"]).copy()

    # Filter the list of countries CountryLabel/CountryCode
all_countries = compass_df[['CountryLabel', 'CountryCode']].copy()
all_countries = all_countries[['CountryLabel', 'CountryCode']].drop_duplicates().reset_index(inplace = False)
all_countries["Aggregate"] = "No"
all_countries = all_countries[['CountryLabel', 'CountryCode', 'Aggregate']]

    # Import Countrycode in my_countries - this dataframe will be called later in coordination with the main dataframe
my_countries = pd.merge(my_countries, 
                      all_countries, 
                      on ='CountryLabel', 
                      how ='inner')
new_row = pd.DataFrame([['OECD Median', 'OECD', 'Yes'], ['OECD Mean', 'OECD2', 'Yes']], columns=['CountryLabel', 'CountryCode', 'Aggregate'])
my_countries = my_countries.append(new_row, ignore_index=True)
all_countries = all_countries.append(new_row, ignore_index=True)

3. Overview of public research system policies¶

This section summarises the policies reported under the public research system (PRS) policy area, one of the six core areas covered by the STIP Compass dataset. Focusing the analysis on the selected five countries, it includes:

A table overview of the number of PRS initiatives and the percentage share of those that have funding data
A treemap showing the distribution of initiatives by responsible organisation
A scatter plot depicting the concentration of initiatives by organisation
A stacked bar chart of the distribution of initiatives by budget bracket
A bullet graph displaying the beneficiaries of PRS initiatives

3.1. Overview of PRS initiatives and funding¶

Table 2 displays of the number of PRS initiatives across the 5 countries, together with the share of initiatives that have budget data. The selected 5 countries have budgetary information for the majority of policies, though it is missing in 35% of policies reported by Canada. The degree of reporting (number of initiatives) by country varies widely, ranging from Korea reporting 23 initiatives up to Germany reporting 66 initiatives. To account for this, calculations made in this notebook are normalised by the country’s number of initiatives. However, analysing this data requires a deep understanding of what the country has reported and how comprehensive and representative they are. Such an assessment would need inputs from national experts. Such validation falls out of the scope of this notebook.

# Data preparation:

    # Copy and filter the central repository:
compass_dfTreemap = compass_df[["OECD_STATUS", "InitiativeID", "CountryLabel", "CountryCode", "YearlyBudgetRange", "PRS"]].copy()

    # Remove non-PRS data:
index_names = compass_dfTreemap[compass_dfTreemap['PRS'] < 0.3 ].index
compass_dfTreemap = compass_dfTreemap.drop(index_names, inplace = False)

    # Generate a table summing 'InitiativeID', 'NBW' and 'coefficient' by 'YearlyBudgetRange' for each 'NameResponsibleOrganisation' in each 'CountryLabel':
compass_df_pivot = pd.pivot_table(data=compass_dfTreemap, index=['OECD_STATUS', 'CountryCode', 'CountryLabel','YearlyBudgetRange'], values=['InitiativeID'], aggfunc={'InitiativeID': pd.Series.nunique}, margins=True, margins_name='total', fill_value=0).reset_index()

    # Add details on budget
compass_df_pivot = pd.merge(compass_df_pivot, 
                      Fundinggrid, 
                      on ='YearlyBudgetRange', 
                      how ='inner')
compass_df_pivot = compass_df_pivot.sort_values(by='order')
compass_df_pivot = compass_df_pivot.drop(['coefficient', 'order'], axis=1)

compass_df_pivot['isbudgeted'] = compass_df_pivot['isbudgeted'] * compass_df_pivot['InitiativeID']

# aggregate by country, removing "YearlyBudgetRange"

compass_df_pivot = pd.pivot_table(data=compass_df_pivot, index=['OECD_STATUS', 'CountryCode', 'CountryLabel'], values=['InitiativeID', 'isbudgeted'], aggfunc={'InitiativeID': pd.Series.sum, 'isbudgeted': pd.Series.sum}, margins=True, margins_name='total', fill_value=0).reset_index()

# Get details for OECD Mean

    # Filter out non-OECD countries
compass_df_pivot_OECD = compass_df_pivot[(compass_df_pivot.OECD_STATUS == "Member")]

    # Compute median in an extra row and append the row to dataframe
new_row = {'OECD_STATUS':'Member', 'CountryLabel':'OECD Mean', 'CountryCode':'OECD2', 'InitiativeID': compass_df_pivot_OECD['InitiativeID'].mean(), 'isbudgeted': compass_df_pivot_OECD['isbudgeted'].mean()}
compass_df_pivot = compass_df_pivot.append(new_row, ignore_index=True)


# Prepare the dataframe for table inclusion:

    # Select the countries to analyse
compass_df_pivot = compass_df_pivot.loc[compass_df_pivot["CountryCode"].isin(list(my_countries["CountryCode"]))]
compass_df_pivot['InitiativeID'] = round(compass_df_pivot['InitiativeID'].astype(float),2)


    # adjust the format to integer / percentage
compass_df_pivot['InitiativeID'] = round(compass_df_pivot['InitiativeID'].astype(float),0)
compass_df_pivot['InitiativeID'] = (compass_df_pivot['InitiativeID'].astype(int))
compass_df_pivot['isbudgeted'] = (compass_df_pivot['isbudgeted']/compass_df_pivot['InitiativeID'])
compass_df_pivot = compass_df_pivot.sort_values(by='isbudgeted', ascending=False)
compass_df_pivot['isbudgeted'] = compass_df_pivot['isbudgeted'].map(lambda n: '{:,.1%}'.format(n))

    # Trim to only show total initiatives and total initiatives with budget data
compass_df_pivot = compass_df_pivot.drop(['CountryCode', 'OECD_STATUS'], axis=1)
compass_df_pivot = compass_df_pivot.rename(columns = {"CountryLabel" : "Country", "isbudgeted" : "Total initiatives with budget data", "InitiativeID" : "Total initiatives"} )


# Generate a table
import plotly.figure_factory as ff

colorscale = [[0, '#272D31'],[.5, '#ffffff'],[1, '#ffffff']]

fig =  ff.create_table(compass_df_pivot, colorscale=colorscale)
fig.layout.width=800
fig.update_layout(title_text="Table 2: Total PRS initiatives and funding rates by country",
            title_font_family="verdana",
            title_font_color='#2a3f5f',                            
            title_font_size=15)
fig.update_layout({'margin':{'t':40}})
fig.show()

3.2. Distribution of initiatives by responsible organisation¶

Figure 1 shows the distribution of public organisations or bodies responsible for funding or managing PRS policy initiatives. The distribution of public bodies for each country is represented by squares, each corresponding to one body. For each of them, the square size is proportional to the number of PRS initiatives reported, as a share of the country’s total. This normalisation makes each countries' total area the same, despite numbers of initiatives and budgets reported varying by country (as per Table 2). The NBW initiatives by organisation is reported in the colour scale, from light yellow (lowest NBW) to dark red (highest NBW). Public bodies with missing funding data will be reported in light yellow along those with no funding. The colour scale is computed as the logarithme of aggregate organisations' NBW value. As the NBW initiatives tend to be highly concentrated in a few public bodies, using the logarithm in the colour scale helps to better distinguish the differences in budget allocations.

# Data preparation

    # Copy and filter the central repository:
compass_dfTreemap = compass_df[["OECD_STATUS", "InitiativeID", "CountryLabel", "CountryCode", "NameResponsibleOrganisation", "YearlyBudgetRange", "coefficient", "NBW", "PRS"]].copy()

    # Remove non-PRS data:
index_names = compass_dfTreemap[compass_dfTreemap['PRS'] < 0.3 ].index
compass_dfTreemap = compass_dfTreemap.drop(index_names, inplace = False)

    # Generate a table summing 'InitiativeID', 'NBW' and 'coefficient' by 'YearlyBudgetRange' for each 'NameResponsibleOrganisation' in each 'CountryLabel':
compass_df_pivot = pd.pivot_table(data=compass_dfTreemap, index=['OECD_STATUS', 'CountryCode', 'CountryLabel','NameResponsibleOrganisation','YearlyBudgetRange', 'InitiativeID'], values=['coefficient', 'NBW'], aggfunc={'coefficient': pd.Series.mean, 'NBW': pd.Series.mean}, margins=False, margins_name='total', fill_value=0).reset_index()
compass_df_pivot["Unique"] = 1
compass_df_pivot = compass_df_pivot.drop_duplicates()



    # Filter out non-OECD countries
compass_df_pivot = compass_df_pivot[(compass_df_pivot.OECD_STATUS == "Member")]


    # Select the countries to analyse
compass_df_pivot_filtered = compass_df_pivot.loc[compass_df_pivot["CountryLabel"].isin(list(my_countries["CountryLabel"]))]
compass_df_pivot_filtered = compass_df_pivot_filtered.reset_index(inplace = False)
compass_df_pivot_filtered = compass_df_pivot_filtered.drop(['index'], axis=1)

for x in range(0, compass_df_pivot_filtered.shape[0]):
    if compass_df_pivot_filtered["NameResponsibleOrganisation"].iloc[x] == "":
        compass_df_pivot_filtered["NameResponsibleOrganisation"] = "Organisation not provided"
            
            
    # Split responsible organisation fields when multiple organisation flagged:
test = compass_df_pivot_filtered[["NameResponsibleOrganisation","InitiativeID"]].copy()
test2 = test[['NameResponsibleOrganisation']].applymap(lambda x: str.count(x, '¬'))
test3 = test2['NameResponsibleOrganisation'].max()

test['total_org'] = test3
for x in range(0, test3):
    y = str(x)
    test['Name'+y] = test['NameResponsibleOrganisation'].str.split('¬', expand=True)[x]
for z in range(0, test.shape[0]):
    for x in range(0, test3):
        y = str(x)
        if test['Name'+y].loc[z] is None:
            test['total_org'].iloc[z] = test['total_org'].iloc[z] - 1
test = test.drop(['NameResponsibleOrganisation'], axis=1)

compass_df_pivot_filtered = pd.merge(compass_df_pivot_filtered, 
                      test, 
                      on ='InitiativeID', 
                      how ='inner')
compass_df_pivot_filtered['Unique'] = compass_df_pivot_filtered['Unique']# / compass_df_pivot_filtered['total_org']
compass_df_pivot_filtered['NBW'] = compass_df_pivot_filtered['NBW']# / compass_df_pivot_filtered['total_org']
compass_df_pivot_filtered = compass_df_pivot_filtered.drop(['total_org'], axis=1)

new_header= list(compass_df_pivot_filtered.columns[9:].values)
compass_df_pivot_filtered = compass_df_pivot_filtered.melt(id_vars=['OECD_STATUS', 'CountryCode', 'CountryLabel', 'YearlyBudgetRange', 'Unique', 'NBW', 'coefficient'], value_vars=new_header, var_name='Variable') #, value_name='InitiativeID')
compass_df_pivot_filtered = compass_df_pivot_filtered.drop(['Variable'], axis=1)
compass_df_pivot_filtered = compass_df_pivot_filtered.rename(columns = {"value" : "NameResponsibleOrganisation"})
compass_df_pivot_filtered = compass_df_pivot_filtered.dropna()

    # Generate a table summing 'Unique', 'NBW' and 'coefficient' by 'YearlyBudgetRange' for each 'NameResponsibleOrganisation' in each 'CountryLabel':
compass_df_pivot_filtered = pd.pivot_table(data=compass_df_pivot_filtered, index=['OECD_STATUS', 'CountryCode', 'CountryLabel','NameResponsibleOrganisation','YearlyBudgetRange'], values=['Unique', 'NBW'], aggfunc={'Unique': sum, 'NBW': sum}, margins=True, margins_name='total', fill_value=0).reset_index()
compass_df_pivot_filtered = compass_df_pivot_filtered.rename(columns = {"Unique" : "InitiativeID"})

compass_df_pivot_filtered.dtypes

# Generate the graph

    # Select data for the graph
levels = ['NameResponsibleOrganisation'] # levels used for the hierarchical chart
color_columns = ['InitiativeID', 'NBW']
value_column = ['InitiativeID']

    # Lets use a dictionary to generate x dataframes
my_dict = {}
for x in range(0, my_countries[my_countries.Aggregate == "No"].shape[0]):
    my_dict[x] = "df_all_trees_" + str(x)

    # Generate the graph for each country
    
# Introduce the function "build_hierarchical_dataframe2"

def build_hierarchical_dataframe2(compass_df_pivot_filtered, levels, value_column, color_columns):
    """
    Build a hierarchy of levels for Treemap charts.
    Levels are given starting from the bottom to the top of the hierarchy,
    ie the last level corresponds to the root.
    """
    my_dict[x] = pd.DataFrame(columns=['id', 'parent', 'value', 'color'])
    for i, level in enumerate(levels):
        df_tree = pd.DataFrame(columns=['id', 'parent', 'value', 'color'])
        dfg = compass_df_pivot_filtered[compass_df_pivot_filtered.CountryCode == my_countries.CountryCode[x]].groupby(levels[i:], sort=False).sum()
        dfg = dfg.reset_index()
        df_tree['id'] = dfg[level].copy()
        if i < len(levels) - 1:
            df_tree['parent'] = dfg[levels[i+1]].copy()
        else:
            df_tree['parent'] = my_countries[my_countries.Aggregate == "No"].CountryLabel[x]
        df_tree['value'] = dfg[value_column]
        df_tree['color'] = dfg[color_columns[1]] #/ dfg[color_columns[0]]
        my_dict[x] = my_dict[x].append(df_tree, ignore_index=True)
    total = pd.Series(dict(id=my_countries[my_countries.Aggregate == "No"].CountryLabel[x], parent='',
                              value=compass_df_pivot_filtered[value_column].sum(),
                              #color=(compass_df_pivot_filtered[color_columns[0].sum()
                              color=(compass_df_pivot_filtered[color_columns[1]] / compass_df_pivot_filtered[color_columns[0]]).sum()))
    my_dict[x] = my_dict[x].append(total, ignore_index=True)
    return my_dict[x]

for x in range(0, my_countries[my_countries.Aggregate == "No"].shape[0]):
    my_dict[x] = build_hierarchical_dataframe2(compass_df_pivot_filtered, levels, value_column, color_columns)
    
fig = make_subplots(1, my_countries[my_countries.Aggregate == "No"].shape[0], specs=[[{"type": "domain"}, {"type": "domain"}, {"type": "domain"}, {"type": "domain"}, {"type": "domain"}]],)

for x in range(0, my_countries[my_countries.Aggregate == "No"].shape[0]):
    
    my_dict[x]['color'] = np.log(my_dict[x]['color'])
    
    fig.add_trace(go.Treemap(
        labels=my_dict[x]['id'], #[:100],
        parents=my_dict[x]['parent'],
        values=my_dict[x]['value'],
        branchvalues='total',
        marker=dict(
            colors=my_dict[x]['color'],
            colorscale='YlOrRd',
            showscale=False),
        hovertemplate='</b> Initiatives: %{value}<br> log(NBW): %{color:.3f}<br>',
        maxdepth=5,
        textfont=dict(size=15),# color=df_all_trees['color_two']),
        name=''
        ), 1, x+1)

#fig.update_traces(showscale=False)

fig.update_layout(margin = dict(t=100, l=0, r=0, b=0), title_text="Figure 1: Distribution of STI public bodies (by initiatives and their budget)",
                  title_font_size=17, title_font_family="verdana")
fig.show()

Figure 1 shows that Germany, Korea and to some extent France have a very large organisations centralising a large portion of PRS initiatives and funding. By comparison, Japan's and Canada’s PRS initiatives and funding are less concentrated. The figure also shows that Canada, France and Germany have a larger number of public bodies in the PRS policy landscape, compared to Japan and Korea. It is important to note that, in a few instances, the same organisation has been reported using different labels. This does not affect the results of the chart, but if this analysis should be replicated and extended across other countries some harmonisation would be necessary.

3.3. Concentration of PRS initiatives by organisation¶

The scatter plot in Figure 2 displays the average number of initiatives for each responsible public body within a given country against the total number of bodies in the country. If one initiative involves more than one responsible organisation, this initiative will be counted under both organisations. The figure gives additional perspective on the concentration of PRS initiatives for the five selected countries compared to the rest of OECD countries.

# Copy and filter the central repository:
compass_df_pivot = compass_df[["OECD_STATUS", "InitiativeID", "CountryLabel", "CountryCode", "NameResponsibleOrganisation", "YearlyBudgetRange", "coefficient", "NBW", "PRS"]].copy()

# Remove non-PRS data:
index_names = compass_df_pivot[compass_df_pivot['PRS'] < 0.3 ].index
compass_df_pivot = compass_df_pivot.drop(index_names, inplace = False)

# Generate a table summing 'InitiativeID', 'NBW' and 'coefficient' by 'YearlyBudgetRange' for each 'NameResponsibleOrganisation' in each 'CountryLabel':
compass_df_pivot = pd.pivot_table(data=compass_df_pivot, index=['OECD_STATUS', 'CountryCode', 'CountryLabel','NameResponsibleOrganisation','YearlyBudgetRange', 'InitiativeID'], values=['coefficient', 'NBW'], aggfunc={'coefficient': pd.Series.mean, 'NBW': pd.Series.mean}, margins=False, margins_name='total', fill_value=0).reset_index()
compass_df_pivot["Unique"] = 1
compass_df_pivot = compass_df_pivot.drop_duplicates()

# Filter out non-OECD countries
compass_df_pivot = compass_df_pivot[(compass_df_pivot.OECD_STATUS == "Member")]

# Split responsible organisation fields when multiple organisation flagged:
    
test = compass_df_pivot[["NameResponsibleOrganisation","InitiativeID"]].copy()
test2 = test[['NameResponsibleOrganisation']].applymap(lambda x: str.count(x, '¬'))
test3 = test2['NameResponsibleOrganisation'].max()
test['total_org'] = test3

for x in range(0, test3):
    y = str(x)
    test['Name'+y] = test['NameResponsibleOrganisation'].str.split('¬', expand=True)[x]
for z in range(0, test.shape[0]):
    for x in range(0, test3):
        y = str(x)
        if test['Name'+y].loc[z] is None:
            test['total_org'].iloc[z] = test['total_org'].iloc[z] - 1
test = test.drop(['NameResponsibleOrganisation'], axis=1)

compass_df_pivot = pd.merge(compass_df_pivot, 
                      test, 
                      on ='InitiativeID', 
                      how ='inner')
#compass_df_pivot['Unique'] = compass_df_pivot['Unique'] / compass_df_pivot['total_org']
compass_df_pivot = compass_df_pivot.drop(['total_org'], axis=1)

new_header= list(compass_df_pivot.columns[9:].values)
compass_df_pivot = compass_df_pivot.melt(id_vars=['OECD_STATUS', 'CountryCode', 'CountryLabel', 'YearlyBudgetRange', 'Unique', 'InitiativeID','NBW', 'coefficient'], value_vars=new_header, var_name='Variable') #, value_name='InitiativeID')
compass_df_pivot = compass_df_pivot.drop(['Variable'], axis=1)
compass_df_pivot = compass_df_pivot.rename(columns = {"value" : "NameResponsibleOrganisation"})
compass_df_pivot = compass_df_pivot.dropna()

# Data preparation:

    # First retrieve a table with count of initiatives per single organisation for each country 
compass_df_table = pd.pivot_table(data=compass_df_pivot, index=['CountryCode', 'CountryLabel', 'NameResponsibleOrganisation'], values=['Unique'], aggfunc={'Unique': pd.Series.sum}, margins=True, margins_name='total', fill_value=0).reset_index()
compass_df_table.head()


    # Remove the data not related to a country (OECD, total, ..)
compass_df_table = compass_df_table[compass_df_table.CountryLabel != ""]

    # Generate a second table by country, summing up the count of organisation and the average number of initiative by organisation for each country:
compass_df_table = pd.pivot_table(data=compass_df_table, index=['CountryCode', 'CountryLabel'], values=['Unique', 'NameResponsibleOrganisation'], aggfunc={'Unique': pd.Series.mean, 'NameResponsibleOrganisation': pd.Series.nunique}, margins=True, margins_name='total', fill_value=0).reset_index()


    # Compute OECD Median
new_row = {'CountryCode':'OECD', 'CountryLabel':'OECD Median', 'Unique': compass_df_table['Unique'].median(), 'NameResponsibleOrganisation': compass_df_table['NameResponsibleOrganisation'].median()}
    # Append the OECD data to the dataframes
compass_df_table = compass_df_table.append(new_row, ignore_index=True)

    # Remove the data not related to a country (total, ..)
compass_df_table = compass_df_table[compass_df_table.CountryLabel != ""]

    # Add a filter on total initiatived by country to filter out outliers - ex: 15 initiatives
compass_df_table['TotalInit'] = compass_df_table['Unique'] * compass_df_table['NameResponsibleOrganisation']

    # Drop countries below 15 Initiatives
compass_df_table = compass_df_table[compass_df_table.TotalInit > 15]
compass_df_table = compass_df_table[compass_df_table.CountryLabel != "Italy"]


    # Introduce CountryLabel2 which will take the value of CountryLabel for the country we want to analyse or "" otherwise
compass_df_table['CountryLabel2'] = compass_df_table['CountryLabel'].isin(list(my_countries.CountryLabel))
compass_df_table.loc[compass_df_table['CountryLabel2'] == True, 'CountryLabel2'] = compass_df_table['CountryLabel']
compass_df_table.loc[compass_df_table['CountryLabel2'] == False, 'CountryLabel2'] = ""

    # Round the average number of initiatives per organisation:
compass_df_table['Unique'] = round(compass_df_table['Unique'],2) 

# Generate the graph:

import plotly.graph_objects as go

Unique=compass_df_table["Unique"],
NameResponsibleOrganisation=compass_df_table["NameResponsibleOrganisation"],

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=compass_df_table["Unique"],
    y=compass_df_table["NameResponsibleOrganisation"],
    text="",
    name="",
    textposition='top center',
    mode='markers+text',
    marker=dict(
        color='rgba(156, 165, 196, 0.95)',
    )
))

fig.add_trace(go.Scatter(
    x=compass_df_table[compass_df_table.CountryLabel2 != ""].Unique,
    y=compass_df_table[compass_df_table.CountryLabel2 != ""].NameResponsibleOrganisation,
    text=compass_df_table[compass_df_table.CountryLabel2 != ""].CountryLabel2,
    name="", 
    textposition='top center',
    mode='markers+text',
    marker=dict(
        color='rgba(255, 0, 0, 0.95)',
    )
))

fig.update_layout(
    title={
        'text': "Figure 2: Average distribution of public research system initiatives by organisation",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
)

fig.update_xaxes(
    title_text='Average initiatives per organisation',
    tickvals=[1,2,3,4,5,6,7,8,9],
    rangemode='tozero',
    zeroline=True
)
fig.update_layout(showlegend=False)

fig.update_yaxes(
    title_text='Number of organisation',
    rangemode='tozero',
    zeroline=True
)
fig.show()

France, Germany and Canada have nearly double the OECD median number of organisations involved in PRS initiatives, while Korea and Japan have slightly fewer. Out of the five selected countries, four of them have an average number of initiatives per organisation similar to that of the OECD median. The exception of France, have an average of 3 initiatives per organisation, on par with OECD median. As for Figure 1, any inconsistencies in public organisation labels would be reflected here. A single organisation reported with different names would be double. In addition, some countries have too few initiatives (namely Iceland and Slovak Republic) or missing data (namely Italy) and are thus excluded from the chart.

3.4. Distribution of PRS initiatives by budget bracket¶

Figure 3 shows countries PRS policy initiatives funding profile. It represents the distribution of initiatives by budget range for selected countries and the OECD median, as a % of total initiatives within the country/median. This figure excludes initiatives without budgetary data, whether the information is missing or whether the initiative is reported not to have any budget.

# Data preparation:

    # Filter on data needed and aggregate "dont know" and "not applicable" budget categories
compass_df_comparison = compass_df[["OECD_STATUS", "InitiativeID", "CountryCode", "NameResponsibleOrganisation", "YearlyBudgetRange", "coefficient", "NBW", "PRS"]].copy()
compass_df_comparison['YearlyBudgetRange'] = compass_df_comparison['YearlyBudgetRange'].replace(["Don't know"],'Not applicable')

    # Remove non-PRS initiatives
index_names = compass_df_comparison[compass_df_comparison['PRS'] < 1 ].index
compass_df_comparison = compass_df_comparison.drop(index_names, inplace = False)

# Complement with OECD data:
    # Generate a dataframe filtered on OECD member only:
compass_df_pivot3a = pd.pivot_table(data=compass_df_comparison[compass_df_comparison["OECD_STATUS"] == "Member"], index=['YearlyBudgetRange'], values=['InitiativeID'], columns=['CountryCode'], aggfunc={'InitiativeID': pd.Series.nunique}, margins_name='total', fill_value=0).reset_index()
compass_df_pivot3a.columns = compass_df_pivot3a.columns.droplevel(-2)
compass_df_pivot3a = compass_df_pivot3a.rename(columns = {"" : "YearlyBudgetRange"})
compass_df_pivot3a['OECD'] = compass_df_pivot3a.median(axis=1)
compass_df_pivot3a['OECD2'] = compass_df_pivot3a.mean(axis=1)

    # Retrieve the pivot table for all countries
compass_df_pivot3 = pd.pivot_table(data=compass_df_comparison, index=['YearlyBudgetRange'], values=['InitiativeID'], columns=['CountryCode'], aggfunc={'InitiativeID': pd.Series.nunique}, margins_name='total', fill_value=0).reset_index()
compass_df_pivot3.columns = compass_df_pivot3.columns.droplevel(-2)

    # Add OECD Median/Mean
compass_df_pivot3['OECD'] = compass_df_pivot3a['OECD']
#compass_df_pivot3['OECD2'] = compass_df_pivot3a['OECD2']

    # Format the table
compass_df_pivot3 = compass_df_pivot3.rename(columns = {"" : "YearlyBudgetRange"})
compass_df_pivot3 = pd.merge(compass_df_pivot3, 
                      Fundinggrid, 
                      on ='YearlyBudgetRange', 
                      how ='inner')
compass_df_pivot3 = compass_df_pivot3.sort_values(by='order')
compass_df_pivot3 = compass_df_pivot3.drop(['coefficient', 'order'], axis=1)

# Prepare the dataframe for graph inclusion:
    # Optional: remove "Not applicable"
compass_df_pivot3 = compass_df_pivot3[compass_df_pivot3.YearlyBudgetRange != "Not applicable"].copy()

    # Compute proportion of total initiatives
for x in range(1, compass_df_pivot3.shape[1]):
       compass_df_pivot3.iloc[:,x] = compass_df_pivot3.iloc[:,x]/ compass_df_pivot3.iloc[:,x].sum()

    # Transpose the data for the stacked bar graph by country
compass_df_pivot3 = compass_df_pivot3.T
new_header = compass_df_pivot3.iloc[0]
compass_df_pivot3 = compass_df_pivot3[1:] 
compass_df_pivot3.columns = new_header
compass_df_pivot3 = compass_df_pivot3 * 100

    # Format the column name
compass_df_pivot3.index = compass_df_pivot3.index.set_names('YearlyBudgetRange')
compass_df_pivot3 = compass_df_pivot3.reset_index()
compass_df_pivot3.rename(columns = {"YearlyBudgetRange":"CountryCode"}, inplace="False")

    # Filter on my_countries and add back CountryLabel
compass_df_pivot3 = pd.merge(compass_df_pivot3, 
                      all_countries, 
                      on ='CountryCode', 
                      how ='inner')
compass_df_pivot3 = compass_df_pivot3.drop(['CountryCode', 'Aggregate'], axis=1)
compass_df_pivot3 = pd.merge(compass_df_pivot3, 
                      my_countries, 
                      on ='CountryLabel', 
                      how ='inner')

# Generate the graph:

import pygal

    # Setup HTML for Pygal graphs
%matplotlib inline
from IPython.display import SVG, HTML

html_pygal = """
<!DOCTYPE html>
<html>
  <head>
  <script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script>
  <script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/pygal-tooltips.js"></script>
    <!-- ... -->
  </head>
  <body>
    <figure>
      {pygal_render}
    </figure>
  </body>
</html>
"""



    #define the color code
from bokeh.palettes import YlOrRd
colors = YlOrRd[9]
colors.reverse()

from pygal.style import Style
custom_style = Style(
  background='transparent',
  plot_background='transparent',
  foreground='#000000', #53E89B',
  foreground_strong='#000000', #53A0E8',
  foreground_subtle='#630C0D',
  opacity='.6',
  opacity_hover='.9',
  transition='400ms ease-in',
  colors=colors)


line_chart = pygal.StackedBar(style=custom_style)

line_chart.title = 'Figure 3: Distribution of initiatives by budget bracket (in EUR)'
line_chart.x_labels = compass_df_pivot3['CountryLabel']
line_chart.y_labels = [
  {'label': '10%', 'value': 10},
  {'label': '20%', 'value': 20},
  {'label': '30%', 'value': 30},
  {'label': '40%', 'value': 40},
  {'label': '50%', 'value': 50},
  {'label': '60%', 'value': 60},
  {'label': '70%', 'value': 70},
  {'label': '80%', 'value': 80},
  {'label': '90%', 'value': 90},
  {'label': '100%', 'value': 100}]

for x in range(0, compass_df_pivot3.shape[1]-3):
    line_chart.add(compass_df_pivot3.iloc[:,x].name, compass_df_pivot3.iloc[:,x].astype(float))

HTML(html_pygal.format(pygal_render=line_chart.render()))

Figure 3 allows comparing the funding profile of the five selected countries and the OECD median. All five countries have larger shares of initiatives in the two largest budget ranges (EUR 100M or more). Medium-sized initiatives (20 to 100M EUR) are more prevalent in Canada, Germany and Japan, while they are relatively less frequent in France. The French profile is characterised by a mix of large initiatives (>100M EUR) and small ones(<20M EUR). Korea appears to be an outlier with 80% of initiatives reported with a budget over 100M EUR. Data for this country should be further inspected to understand this funding profile.

3.5. Target groups of PRS initiatives¶

The final graph in the overview of PRS policy initiatives focuses on the target groups or the direct recipients of funding (Figure 5). Initiatives generally target individual researchers (including primarily established researchers, postdocs and researchers), organisations (public research institutes or higher education institutes), or a mix of both. To this end, a value is assigned for each initiative: 0 when targeting individual researchers only, 1 when targeting organisations exclusively, and 0.5 when directed towards a mix of both. An aggregate ratio by country is then computed by aggregating these values and the associated initiatives’ budget coefficients, to summarise each country's tendency in terms of these two types of beneficiaries. The colour shading indicates the extent to which the reported initiatives have budget information, with dark red signalling funding details available for all, and light-yellow evidencing largely missing funding details. Finally, the total number of initiatives is indicated next to each country label.

# Data preparation:
    # Select data from central dataframe, and summarise target type in two variables, TargetOrganisations and TargetIndividuals 
compass_df_TargetFunding = compass_df.copy()
compass_df_TargetFunding['TargetOrganisations'] = compass_df_TargetFunding['TG20'] + compass_df_TargetFunding['TG21'] + compass_df_TargetFunding['TG22']
compass_df_TargetFunding['TargetIndividuals'] = compass_df_TargetFunding['TG9'] + compass_df_TargetFunding['TG10'] + compass_df_TargetFunding['TG11'] + compass_df_TargetFunding['TG12'] + compass_df_TargetFunding['TG13'] + compass_df_TargetFunding['TG38']
compass_df_TargetFunding['TargetOrganisations'] = compass_df_TargetFunding['TargetOrganisations'].astype(float)
compass_df_TargetFunding['TargetIndividuals'] = compass_df_TargetFunding['TargetIndividuals'].astype(float)

    # Remove non PRS data
index_names = compass_df_TargetFunding[compass_df_TargetFunding['PRS'] < 0.3 ].index
compass_df_TargetFunding = compass_df_TargetFunding.drop(index_names, inplace = False)

    # Compute a NBW-weighted value for both TargetOrganisations and TargetIndividuals
compass_df_TargetFunding['TargetOrganisations'] = (compass_df_TargetFunding['TargetOrganisations']/compass_df_TargetFunding['TargetOrganisations']*compass_df_TargetFunding['NBW']).astype(float)
compass_df_TargetFunding['TargetIndividuals'] = (compass_df_TargetFunding['TargetIndividuals']/compass_df_TargetFunding['TargetIndividuals']*compass_df_TargetFunding['NBW']).astype(float)
compass_df_TargetFunding['Unique'] = 1

    # Generate a table listing initiatives by country
compass_df_TargetFundingPivot = pd.pivot_table(data=compass_df_TargetFunding, index=["OECD_STATUS", 'CountryLabel', 'CountryCode', 'InitiativeID'], values=['isbudgeted','TargetOrganisations','TargetIndividuals','Unique'], aggfunc={'isbudgeted': pd.Series.mean, 'TargetOrganisations': sum, 'TargetIndividuals': sum, 'Unique': sum}, margins=True, margins_name='total', fill_value=0).reset_index()

    # Divide TargetIndividuals and TargetOrganisations by Unique to ensure Initiatives with multiple entries are not double-counted: 
compass_df_TargetFundingPivot["TargetIndividuals"] = compass_df_TargetFundingPivot["TargetIndividuals"] / compass_df_TargetFundingPivot["Unique"]
compass_df_TargetFundingPivot["TargetOrganisations"] = compass_df_TargetFundingPivot["TargetOrganisations"] / compass_df_TargetFundingPivot["Unique"]
compass_df_TargetFundingPivot = compass_df_TargetFundingPivot.fillna(0)

    # Introduce a third variable "BOTHINDORG" for initiatives targeting both individuals and organisations
        # Note: All three binaries (TargetIndividuals, TargetOrganisations,BothINDORG) must be mutually exclusive
compass_df_TargetFundingPivot['BothINDORG'] = np.where(compass_df_TargetFundingPivot['TargetIndividuals']!= compass_df_TargetFundingPivot['TargetOrganisations'], 0, compass_df_TargetFundingPivot['TargetOrganisations'])
compass_df_TargetFundingPivot['TargetIndividuals'] = compass_df_TargetFundingPivot['TargetIndividuals'] - compass_df_TargetFundingPivot['BothINDORG']
compass_df_TargetFundingPivot['TargetOrganisations'] = compass_df_TargetFundingPivot['TargetOrganisations'] - compass_df_TargetFundingPivot['BothINDORG']
compass_df_TargetFundingPivot = compass_df_TargetFundingPivot.drop(['Unique'], axis = 1)
        # Note: We now have a table listing for each initiatives, the NBW by TargetIndividuals, TargetOrganisations and BothINDORG

    # Generate a table summing NBW for each variables by country
compass_df_TargetFundingPivot2 = pd.pivot_table(data=compass_df_TargetFundingPivot, index=["OECD_STATUS", 'CountryLabel', 'CountryCode'], values=['isbudgeted', 'InitiativeID', 'TargetOrganisations','TargetIndividuals', 'BothINDORG'], aggfunc={'isbudgeted': pd.Series.mean, 'InitiativeID':pd.Series.nunique,'TargetOrganisations': sum, 'TargetIndividuals': sum, 'BothINDORG': sum}, margins=True, fill_value=0).reset_index()

# Complement with OECD data:
    # Compute OECD Median from dataframe obtained
compass_df_TargetFundingPivot2OECD = compass_df_TargetFundingPivot2.copy()

    # Select only OECD members
index_names = compass_df_TargetFundingPivot2OECD[(compass_df_TargetFundingPivot2OECD['OECD_STATUS'] != 'Member')].index
compass_df_TargetFundingPivot2OECD.drop(index_names, inplace = True)

    # Compute median in an extra row and append the row to dataframe
new_row = {'OECD_STATUS':'Member', 'CountryLabel':'OECD Median', 'CountryCode':'OECD', 'BothINDORG': compass_df_TargetFundingPivot2OECD['BothINDORG'].median(), 'TargetIndividuals': compass_df_TargetFundingPivot2OECD['TargetIndividuals'].median(), 'TargetOrganisations': compass_df_TargetFundingPivot2OECD['TargetOrganisations'].median(), 'isbudgeted':compass_df_TargetFundingPivot2OECD['isbudgeted'].median(), 'InitiativeID':compass_df_TargetFundingPivot2OECD['InitiativeID'].median()}
compass_df_TargetFundingPivot2 = compass_df_TargetFundingPivot2.append(new_row, ignore_index=True)

# generate alternative country label with funded initiative / total initiative
compass_df_TargetFundingPivot_Label = compass_df_TargetFundingPivot2[['CountryLabel', 'InitiativeID', 'isbudgeted']].copy()
compass_df_TargetFundingPivot_Label['isbudgeted'] = compass_df_TargetFundingPivot_Label['isbudgeted'] * compass_df_TargetFundingPivot_Label['InitiativeID']
compass_df_TargetFundingPivot_Label['isbudgeted'] = compass_df_TargetFundingPivot_Label['isbudgeted'].astype(int)
compass_df_TargetFundingPivot_Label['InitiativeID'] = compass_df_TargetFundingPivot_Label['InitiativeID'].astype(int)

compass_df_TargetFundingPivot_Label['CountryLabel2'] = compass_df_TargetFundingPivot_Label['CountryLabel'] + " (" + compass_df_TargetFundingPivot_Label['isbudgeted'].astype(str) + " / " + compass_df_TargetFundingPivot_Label['InitiativeID'].astype(str) + ") "
compass_df_TargetFundingPivot_Label['CountryLabel2'] = compass_df_TargetFundingPivot_Label['CountryLabel'] + " (" + compass_df_TargetFundingPivot_Label['InitiativeID'].astype(str) + ") "
compass_df_TargetFundingPivot_Label = compass_df_TargetFundingPivot_Label.drop(['InitiativeID', 'isbudgeted'], axis=1).copy()


# Prepare the dataframe for graph inclusion:
    # Reorder the columns to have variables on the right:
compass_df_TargetFundingPivot2 = compass_df_TargetFundingPivot2.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'isbudgeted', 'BothINDORG', 'TargetIndividuals', 'TargetOrganisations'])

    # Introduce a variable 'Total' that sums all the initiatives
compass_df_TargetFundingPivot2['Total'] = 0
for x in range(4, compass_df_TargetFundingPivot2.shape[1]-1):
       compass_df_TargetFundingPivot2['Total'] = compass_df_TargetFundingPivot2['Total'] + compass_df_TargetFundingPivot2.iloc[:,x]

    # Obtain the data in percent instead:
for x in range(4, compass_df_TargetFundingPivot2.shape[1]-1):
       compass_df_TargetFundingPivot2.iloc[:,x] = compass_df_TargetFundingPivot2.iloc[:,x]/ compass_df_TargetFundingPivot2['Total']

    # Format the percentages obtained
for x in range(4, compass_df_TargetFundingPivot2.shape[1]-1):
        compass_df_TargetFundingPivot2.iloc[:,x] = pd.Series(["{0:.5f}".format(val) for val in compass_df_TargetFundingPivot2.iloc[:,x]], index = compass_df_TargetFundingPivot2.index)
        
    # Format the data as float for display and computation of ratio
for x in range(3, compass_df_TargetFundingPivot2.shape[1]):
        compass_df_TargetFundingPivot2.iloc[:,x] = compass_df_TargetFundingPivot2.iloc[:,x].astype(float)

    # Compute ratio of targets
compass_df_TargetFundingPivot2["Ratio"] = compass_df_TargetFundingPivot2["BothINDORG"] / 2 + compass_df_TargetFundingPivot2["TargetOrganisations"] 

    # Select the countries to analyse
compass_df_TargetFundingPivot2_filtered = compass_df_TargetFundingPivot2.loc[compass_df_TargetFundingPivot2["CountryCode"].isin(list(my_countries["CountryCode"]))]


# amend the label to add the extended labels:
compass_df_TargetFundingPivot2_filtered = pd.merge(compass_df_TargetFundingPivot2_filtered,
                      compass_df_TargetFundingPivot_Label,
                      on ='CountryLabel', 
                      how ='inner')
    # Get rid of intermediate totals (not needed any more)
compass_df_TargetFundingPivot2_filtered['CountryLabel'] = compass_df_TargetFundingPivot2_filtered['CountryLabel2'] 
compass_df_TargetFundingPivot2_filtered = compass_df_TargetFundingPivot2_filtered.drop(['CountryLabel2'], axis=1).copy()
compass_df_TargetFundingPivot2_filtered

# Generate the graph:

import plotly.graph_objects as go

country = compass_df_TargetFundingPivot2_filtered['CountryLabel']

Ratio = compass_df_TargetFundingPivot2_filtered['Ratio']

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=Ratio,
    y=country,
    name='Researchers, students and teachers',
    marker=dict(
        size=13,
        cmax=1,
        cmin=0,
        color=compass_df_TargetFundingPivot2_filtered['isbudgeted'],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Initiative funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of initiaitves", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f',
        ),
        line_color='rgba(156, 165, 196, 1.0)',
    )
))

fig.update_traces(mode='markers', marker=dict(line_width=1, symbol='circle', size=13))

fig.update_layout(
    title={
        'text': "Figure 4: Target groups in PRS initiatives <br> as a share of total budget-weighted initiatives within country ",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),  
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=300,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)

fig.update_xaxes(
    title_text='Funding targets',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(-0.05, 1.1),
    ticktext=["Only researchers, students <br> & teachers targeted", "Both equally <br> targeted", "Only research & education <br> institutions targeted"],
    tickvals=[0.01, 0.5, 1],
    showgrid=True,
    gridcolor="#444",
#    gridfont=dict(family="Open Sans"),
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
 
fig.show()

The OECD median leans towards targeting public research organisations over funding of individual researchers. Among the five selected countries, initiatives in Canada and Korea appear to target individual researchers more often. Japan, Germany and France lean more towards targeting organisations than the OECD median. Funding details are reported for most initiatives for among the five countries although, as mentioned in Section 1, 35% of initiatives in Canada are missing budget data.

4. Focusing on Instruments¶

This section focuses on specific instruments relevant for public research:

Institutional funding for public research
Project grants for public research
Dedicated support for research infrastructures

While Section 3 focused on initiatives reported under the PRS policy area, this section looks at the full STIP Compass database. This is because the above instruments are reported in other areas of the survey (e.g. ) Table 3 indicates the number of instruments reported by country and instrument type. More details on the availability of funding data are provided in the following sections.

# Data preparation:

# Copy and filter the central repository:
compass_dfTreemap = compass_df[["OECD_STATUS", "InitiativeID", "CountryLabel", "CountryCode", "InstrumentTypeLabel", "InstrumentID"]].copy()

# Generate a table summing 'InstrumentID' numbers by instrment type
compass_df_pivot = pd.pivot_table(data=compass_dfTreemap, index=['OECD_STATUS', 'CountryCode', 'CountryLabel','InstrumentTypeLabel'], values=['InstrumentID'], aggfunc={'InstrumentID': pd.Series.nunique}, margins=True, margins_name='total', fill_value=0).reset_index()
compass_df_pivot = compass_df_pivot[(compass_df_pivot.InstrumentTypeLabel == "Institutional funding for public research") |(compass_df_pivot.InstrumentTypeLabel == "Project grants for public research") |(compass_df_pivot.InstrumentTypeLabel == "Dedicated support to research infrastructures")]
compass_df_pivot['InstrumentID'] = compass_df_pivot['InstrumentID'].astype(int)

compass_df_pivot=pd.pivot_table(compass_df_pivot, index=['CountryCode','OECD_STATUS','CountryLabel'], columns='InstrumentTypeLabel', values='InstrumentID').reset_index()

# Compute median in an extra row and append the row to dataframe
compass_df_pivot_OECD = compass_df_pivot[(compass_df_pivot.OECD_STATUS == "Member")]
new_row = {'OECD_STATUS':'Member', 'CountryLabel':'OECD Mean', 'CountryCode':'OECD2', 'Dedicated support to research infrastructures': round(compass_df_pivot_OECD['Dedicated support to research infrastructures'].mean(),2), 'Institutional funding for public research': round(compass_df_pivot_OECD['Institutional funding for public research'].mean(),2), 'Project grants for public research':round(compass_df_pivot_OECD['Project grants for public research'].mean(),2)}
compass_df_pivot = compass_df_pivot.append(new_row, ignore_index=True)

# filter for selected countries
compass_df_pivot = compass_df_pivot.loc[compass_df_pivot["CountryCode"].isin(list(my_countries["CountryCode"]))]
compass_df_pivot.reset_index()
compass_df_pivot = compass_df_pivot.drop(['CountryCode', 'OECD_STATUS'], axis=1).copy()


# reorder and rename the columns
compass_df_pivot = compass_df_pivot.reindex(columns=['CountryLabel', 'Institutional funding for public research', 'Project grants for public research', 'Dedicated support to research infrastructures'])
compass_df_pivot = compass_df_pivot.rename(columns={"Dedicated support to research infrastructures":"Dedicated support to <br>research infrastructures", "Institutional funding for public research":"Institutional funding <br>for public research", "Project grants for public research":"Project grants for <br>public research"})


# Generate a table
import plotly.figure_factory as ff

colorscale = [[0, '#272D31'],[.5, '#ffffff'],[1, '#ffffff']]

fig =  ff.create_table(compass_df_pivot, colorscale=colorscale)
fig.layout.width=850
fig.update_layout(title_text="Table 3: Number of instruments reported by country and selected instrument type",
            title_font_family="verdana",
            title_font_color='#2a3f5f',                            
            title_font_size=15)
fig.update_layout({'margin':{'t':40}})
fig.show()

4.1. Institutional funding for public research¶

The first series of graph looks at institutional funding for public research, including:.

Non-competitive grants funding public research organisations according to various criteria (e.g. research capacity and performance indicators) to fulfil their research missions.
Block funding provides public research organisations with stable resources and a certain degree of autonomy in their research activities.

4.1.1. Funding horizons¶

Figure 5 looks at average time-period of the institutional funding instruments. For each country, it indicates the NBW distribution of initiatives across three distinct budgeted horizons: 3 years or less, 4-6 years and 7 years or more. In some instances, the time horizon is not specified in the instrument survey: these cases appear as "Not specified" in the figure. For each time horizon, the size of the corresponding bubble is proportional to the NBW number of instruments within each country, as a share of total NBW instruments reported in the country. As in the previous figure, the colour shading indicates the extent to which the instruments have associated budget information. The number of institutional funding instruments reported are also indicated next to each country label.

# Data preparation:

    # Copy the central dataframe
compass_df_Instrument = compass_df.copy()

    # Filter on initiatives linked to "Institutional funding for public research":
compass_df_Instrument = compass_df_Instrument[(compass_df_Instrument.InstrumentTypeLabel == "Institutional funding for public research")]

    # Generate a table of initiatives by time-period for each country
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentID', 'NBW', 'InstrumentTypeLabel'], values=['isbudgeted', 'F41:V28','F41:V29','F41:V30'], aggfunc={'isbudgeted': pd.Series.mean, 'F41:V28': sum, 'F41:V29': sum, 'F41:V30': sum}, fill_value=0).reset_index()

    # Add an extra column to account for missing timeline:
compass_df_Instrument["F41:V31"] = 0
for x in range(6, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument["F41:V31"] = compass_df_Instrument["F41:V31"] + compass_df_Instrument.iloc[:,x]

for x in range(0, compass_df_Instrument.shape[0]-1):
    if compass_df_Instrument["F41:V31"].loc[x] != 0:
        compass_df_Instrument["F41:V31"].loc[x]  = 0
    else:
        compass_df_Instrument["F41:V31"].loc[x]  = 1
        # Note: we've added "F41:V31" to the dictionary earlier

    # Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'NBW', 'isbudgeted', 'F41:V28', 'F41:V29', 'F41:V30', 'F41:V31'])

compass_df_Instrument['isbudgeted28'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F41:V28']
compass_df_Instrument['isbudgeted29'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F41:V29']
compass_df_Instrument['isbudgeted30'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F41:V30']
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F41:V31']
compass_df_Instrument['total28'] = compass_df_Instrument['F41:V28']
compass_df_Instrument['total29'] = compass_df_Instrument['F41:V29']
compass_df_Instrument['total30'] = compass_df_Instrument['F41:V30']
compass_df_Instrument['total31'] = compass_df_Instrument['F41:V31']

# Weight each data by NBW
for x in range(6, compass_df_Instrument.shape[1]-8):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] * compass_df_Instrument['NBW']

# Aggregate the data by countries
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['InstrumentID', 'isbudgeted', 'isbudgeted28', 'isbudgeted29', 'isbudgeted30', 'isbudgeted31', 'total28', 'total29', 'total30', 'total31', 'F41:V28','F41:V29','F41:V30', 'F41:V31'], aggfunc={'InstrumentID':pd.Series.nunique, 'isbudgeted': pd.Series.mean, 'isbudgeted28': sum, 'isbudgeted29': sum, 'isbudgeted30': sum, 'isbudgeted31': sum, 'total28': sum, 'total29': sum, 'total30': sum, 'total31': sum, 'F41:V28': sum, 'F41:V29': sum, 'F41:V30': sum, 'F41:V31': sum}, margins=True, margins_name='total2', fill_value=0).reset_index()
index_names = compass_df_Instrument[(compass_df_Instrument['OECD_STATUS'] == '')].index
compass_df_Instrument.drop(index_names, inplace = True)
       
compass_df_Instrument['isbudgeted28'] = compass_df_Instrument['isbudgeted28'] / compass_df_Instrument['total28'] 
compass_df_Instrument['isbudgeted29'] = compass_df_Instrument['isbudgeted29'] / compass_df_Instrument['total29'] 
compass_df_Instrument['isbudgeted30'] = compass_df_Instrument['isbudgeted30'] / compass_df_Instrument['total30'] 
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted31'] / compass_df_Instrument['total31'] 

# Get rid of intermediate totals (not needed any more)
compass_df_Instrument = compass_df_Instrument.drop(['total28', 'total29', 'total30', 'total31'], axis=1).copy()

# Generate a dataframe filtered on OECD member only:
compass_df_Instrument_bis = compass_df_Instrument[compass_df_Instrument.OECD_STATUS == "Member"]
new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'F41:V28': round(compass_df_Instrument_bis['F41:V28'].mean(),1), 'F41:V29': round(compass_df_Instrument_bis['F41:V29'].mean(),1), 'F41:V30': round(compass_df_Instrument_bis['F41:V30'].mean(),1), 'F41:V31': round(compass_df_Instrument_bis['F41:V31'].mean(),1), 'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted28': compass_df_Instrument_bis['isbudgeted28'].mean(), 'isbudgeted29': compass_df_Instrument_bis['isbudgeted29'].mean(), 'isbudgeted30': compass_df_Instrument_bis['isbudgeted30'].mean(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].mean(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].mean()}
new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD', 'F41:V28': round(compass_df_Instrument_bis['F41:V28'].median(),1), 'F41:V29': round(compass_df_Instrument_bis['F41:V29'].median(),1), 'F41:V30': round(compass_df_Instrument_bis['F41:V30'].median(),1), 'F41:V31': round(compass_df_Instrument_bis['F41:V31'].median(),1), 'isbudgeted': compass_df_Instrument_bis['isbudgeted'].median(), 'isbudgeted28': compass_df_Instrument_bis['isbudgeted28'].median(), 'isbudgeted29': compass_df_Instrument_bis['isbudgeted29'].median(), 'isbudgeted30': compass_df_Instrument_bis['isbudgeted30'].median(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].median(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].median()}

# Append the OECD data to the dataframes
compass_df_Instrument = compass_df_Instrument.append(new_row, ignore_index=True)


# Generate new labels dataframe
compass_df_Instrument_Label = compass_df_Instrument[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']


compass_df_Instrument_Label['isbudgeted'] = compass_df_Instrument_Label['InstrumentID'] * compass_df_Instrument_Label['isbudgeted']

for x in range(0, compass_df_Instrument_Label.shape[0]):
    if compass_df_Instrument_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (round(compass_df_Instrument_Label['InstrumentID'].loc[x],1).astype(float)).astype(str) + ") "
    else:
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (compass_df_Instrument_Label['isbudgeted'].loc[x].astype(int)).astype(str) + " / " + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + " ) "
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " ("  + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ") "

# Prepare the dataframe for graph inclusion:
# Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted28', 'isbudgeted29', 'isbudgeted30', 'isbudgeted31','F41:V28', 'F41:V29', 'F41:V30', 'F41:V31', 'F41:V32'])

# The dataframe obtained shows counts of initiatives for each time horizon - Compute the proportion instead
compass_df_Instrument["F41:V32"] = 0
for x in range(9, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument["F41:V32"] = compass_df_Instrument["F41:V32"] + compass_df_Instrument.iloc[:,x]
for x in range(9, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] / compass_df_Instrument["F41:V32"] #compass_df_Instrument.iloc[:,compass_df_Instrument.shape[1]]
    

    # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument.loc[compass_df_Instrument["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Trim the dataframe to only core data and transpose it
compass_df_Instrument2 = compass_df_Instrument2[['CountryLabel', 'InstrumentID', 'isbudgeted', 'isbudgeted28', 'isbudgeted29', 'isbudgeted30', 'isbudgeted31','F41:V28', 'F41:V29', 'F41:V30', 'F41:V31']].copy()

    # rename the InstrumentID to total
compass_df_Instrument2 = compass_df_Instrument2.rename(columns={"InstrumentID":"Total"})

    # Retrieve headers frim Dict_Headers
d = Dict_Headers.set_index('Code').to_dict()
compass_df_Instrument2.columns = compass_df_Instrument2.columns.to_series().map(d['Long Description'])
compass_df_Instrument2

    # melt the dataframe (twice, one for the NBW by timelines, one for the initiatives funded (i.e the color)
compass_df_Instrument3 = compass_df_Instrument2.copy()
new_header3 = list(compass_df_Instrument3.columns[7:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["CountryLabel", "isbudgeted"], value_vars=new_header3, var_name='Variable', value_name='Value')
compass_df_Instrument3 = compass_df_Instrument3.sort_values(["Variable", "CountryLabel", "isbudgeted"]).copy()
compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[3:7].values)
compass_df_Instrument4 = compass_df_Instrument2.melt(id_vars=["CountryLabel", "Total"], value_vars=new_header4, var_name='Variable', value_name='Color')
compass_df_Instrument4 = compass_df_Instrument4.sort_values(["Color", "CountryLabel"]).copy()
compass_df_Instrument3["Color"] = compass_df_Instrument4["Color"]
compass_df_Instrument2 = compass_df_Instrument3.copy()


# amend the label to add the extended labels:
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      compass_df_Instrument_Label, 
                      on ='CountryLabel', 
                      how ='inner')
    # Get rid of intermediate totals (not needed any more)
compass_df_Instrument2['CountryLabel'] = compass_df_Instrument2['CountryLabel2'] 
compass_df_Instrument2 = compass_df_Instrument2.drop(['InstrumentID', 'isbudgeted_x', 'isbudgeted_y', 'CountryLabel2'], axis=1).copy()
compass_df_Instrument2['Value'] = compass_df_Instrument2['Value'].fillna(0)
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)


    # assign numerical values to x-axis values
compass_df_Instrument2["axislocation"] = 1

for x in range(0, compass_df_Instrument2.shape[0]-1):
        if compass_df_Instrument2["Variable"].loc[x] == "3 years or less":
            compass_df_Instrument2["axislocation"].loc[x]  = 0.25
        elif compass_df_Instrument2["Variable"].loc[x] == "4-6 years":
            compass_df_Instrument2["axislocation"].loc[x]  = 0.5
        elif compass_df_Instrument2["Variable"].loc[x] == "7 years or more":
            compass_df_Instrument2["axislocation"].loc[x]  = 0.75
        else:
            compass_df_Instrument2["axislocation"].loc[x]  = 1
                                  
# Generate the graph:

import plotly.graph_objects as go

Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"], 
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 5: Budget horizons in institutional funding for public research <br> as a share of total budget-weighted instruments reported",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=450,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Funding horizon',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(0.1, 1.1),
    ticktext=['3 years or less', '4-6 years', '7 years or more', 'Not specified'],
    tickvals=[0.25, 0.5, 0.75, 1],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)

fig.show()

In average, most of the OECD’s institutional funding for public research is short term (3 years or less) or mid-term (4-6 years). While Korea follows the same pattern, the other four selected countries have a longer-term focus in comparison. Japan, Germany, and Canada have a larger bulk of mid-term initiatives (4-6 years). France allocates more budget to a mix of long-term (>7 years) and short-term (3 years or less) initiatives.

4.1.2. Performance-based element to the allocation¶

Allocation of institutional funding for public research can be done on the basis of performance-based contracts or criteria. Figure 6 looks at the prevalence of performance-based element to the allocation of funding. For each country, a value is assigned for each initiative: 1 when a performance-based element is reported, and 0 otherwise. For each country, an aggregate ratio is then computed by aggregating these values and the associated initiatives’ budget weight coefficients, to summarise country prevalences of performance-based elements in the allocation of institutional funding.

# Data preparation:

    # Copy the central dataframe
compass_df_Instrument2 = compass_df.copy()

    # Filter on initiatives linked to "Institutional funding for public research"
compass_df_Instrument2 = compass_df_Instrument2[(compass_df_Instrument2.InstrumentTypeLabel == "Institutional funding for public research")]

    # Generate a table of initiatives
compass_df_Instrument2 = compass_df_Instrument2[['InstrumentID', 'OECD_STATUS', 'CountryLabel', 'InstrumentTypeLabel', 'CountryCode', 'NBW', 'F38:V4', 'F38:V5', 'isbudgeted']].copy()

    # Weight initiatives by NBW
compass_df_Instrument2['F38:V4'] = compass_df_Instrument2['F38:V4'] * compass_df_Instrument2['NBW']
compass_df_Instrument2['F38:V5'] = compass_df_Instrument2['F38:V5'] * compass_df_Instrument2['NBW']

    # Generate a table of initiatives for each country
compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentTypeLabel'], values=['isbudgeted','InstrumentID', 'F38:V4','F38:V5'], aggfunc={'isbudgeted': pd.Series.mean, 'InstrumentID': pd.Series.nunique, 'F38:V4': sum, 'F38:V5': sum}, fill_value=0).reset_index()


# Introduce a variable 'Ratio' that reflects the propotion of performance-based initiatives to total initiatives
compass_df_Instrument2['Ratio'] = compass_df_Instrument2['F38:V4'] / (compass_df_Instrument2['F38:V4'] + compass_df_Instrument2['F38:V5'])
compass_df_Instrument2['Ratio'] = compass_df_Instrument2['Ratio'].astype(float)


# Complement with OECD data:

    # Generate a dataframe filtered on OECD member only:
compass_df_Instrument2_bis = compass_df_Instrument2[compass_df_Instrument2.OECD_STATUS == "Member"]
new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'InstrumentTypeLabel': 'Institutional funding for public research', 'F38:V4': round(compass_df_Instrument2_bis['F38:V4'].mean(),1), 'F38:V5': round(compass_df_Instrument2_bis['F38:V5'].mean(),1), 'Ratio': round((compass_df_Instrument2_bis['F38:V4'].mean()/(compass_df_Instrument2_bis['F38:V4'].mean()+compass_df_Instrument2_bis['F38:V5'].mean())),4), 'InstrumentID': round(compass_df_Instrument2_bis['InstrumentID'].mean(),1), 'isbudgeted': round(compass_df_Instrument2_bis['isbudgeted'].mean(),1)}
new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD', 'InstrumentTypeLabel': 'Institutional funding for public research', 'F38:V4': round(compass_df_Instrument2_bis['F38:V4'].median(),1), 'F38:V5': round(compass_df_Instrument2_bis['F38:V5'].median(),1), 'Ratio': round((compass_df_Instrument2_bis['F38:V4'].median()/(compass_df_Instrument2_bis['F38:V4'].median()+compass_df_Instrument2_bis['F38:V5'].median())),4), 'InstrumentID': round(compass_df_Instrument2_bis['InstrumentID'].median(),1), 'isbudgeted': round(compass_df_Instrument2_bis['isbudgeted'].median(),1)}

    # Append the OECD data to the dataframes
compass_df_Instrument2 = compass_df_Instrument2.append(new_row, ignore_index=True)



# Generate new labels dataframe
compass_df_Instrument_Label = compass_df_Instrument2[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']
compass_df_Instrument_Label['isbudgeted'] = compass_df_Instrument_Label['InstrumentID'] * compass_df_Instrument_Label['isbudgeted']


for x in range(0, compass_df_Instrument_Label.shape[0]):
    if compass_df_Instrument_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (round(compass_df_Instrument_Label['InstrumentID'].loc[x],1).astype(float)).astype(str) + ")"
    else:
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ")"
                
# Prepare the dataframe for graph inclusion:
    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:
index_names = compass_df_Instrument2[(compass_df_Instrument2['F38:V4'] + compass_df_Instrument2['F38:V5']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)

    # Sort the dataframe in ascending Ratio:
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='Ratio', ascending=False)


    # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument2.loc[compass_df_Instrument2["CountryCode"].isin(list(my_countries["CountryCode"]))]


    # update the label to used amended labels

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument_Label, 
                        on ='CountryLabel', 
                        how ='inner')
    # Get rid of intermediate totals (not needed any more)
compass_df_Instrument2['CountryLabel'] = compass_df_Instrument2['CountryLabel2'] 
compass_df_Instrument2 = compass_df_Instrument2.drop(['InstrumentID_x', 'InstrumentID_y', 'isbudgeted_x', 'isbudgeted_y', 'CountryLabel2'], axis=1).copy()



# Generate the graph:

import plotly.graph_objects as go

country = compass_df_Instrument2['CountryLabel']

Ratio = compass_df_Instrument2['Ratio']

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=Ratio,
    y=country,
    name='Performance based elelement to the allocation',
    marker=dict(
        size=13,
        cmax=1,
        cmin=0,
        color=compass_df_TargetFundingPivot2_filtered['isbudgeted'],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f',
        ),
        line_color='rgba(156, 165, 196, 1.0)',
    )
))

fig.update_traces(mode='markers', marker=dict(line_width=1, symbol='circle', size=13))

fig.update_layout(
    title={
        'text': "Figure 6: 'Institutional funding for public research' instruments <br>with a performance-based allocation element <br> as a share of total budget-weighted instruments within country",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=300,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)
fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
fig.update_xaxes(
    title_text='Performance-based allocation',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(-0.05, 1.1),
    ticktext=["Never", "Half the time", "Always"],
    tickvals=[0.01, 0.5, 1],
    showgrid=True,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)

fig.show()

On average, two thirds of OECD area’s NBW institutional funding instruments will include performance criteria in the allocation process. Great variance can be observed across the five selected countries. While Germany allocated funding based on performance criteria in less than half of the NBW instruments, Japan and particularly Korea use more systematically performance-based criteria. Canada and France gravitate more closely around the OECD mean.

4.1.3. Allocation criteria¶

Figure 7 focuses on the various types of criteria used to allocate institutional funding for public research. The STIP Compass dataset considers different types of criteria but, as we will see, some are more intensively cited compared to others. The figure aggregates the NBW frequency of allocation criteria by country. The criteria are ranked by degree of importance among the selected countries and are displayed by decreasing order of prevalence. For each country, the bubble sizes are proportional to the corresponding criteria's prevalence in the overall pool of NBW instruments.

# Data preparation:
    # Copy the central dataframe
compass_df_Instrument2 = compass_df.copy()

    # Filter on initiatives linked to "Institutional funding for public research"
compass_df_Instrument2 = compass_df_Instrument2[(compass_df_Instrument2.InstrumentTypeLabel == "Institutional funding for public research")].copy()

    # Generate a table of initiatives
compass_df_Instrument2 = compass_df_Instrument2[['InstrumentID', 'OECD_STATUS', 'CountryLabel', 'CountryCode', 'isbudgeted', 'NBW', 'F40:V7', 'F40:V8', 'F40:V9', 'F40:V10', 'F40:V11', 'F40:V12', 'F40:V14', 'F40:V13', 'F40:V15', 'F40:V16', 'F40:V17', 'F40:V18', 'F40:V668', 'F40:V800', 'F40:V801', 'F40:V669']].copy()

    # inroduce 2 sets of variables, "isbudgetedXX" and "totalXX"

mapping_labels={'BUDGET':['isbudgeted7','isbudgeted8','isbudgeted9','isbudgeted10','isbudgeted11','isbudgeted12','isbudgeted13','isbudgeted14','isbudgeted15','isbudgeted16','isbudgeted17','isbudgeted18','isbudgeted668','isbudgeted669', 'isbudgeted800', 'isbudgeted801'], 
                'TOTAL':['total7', 'total8', 'total9', 'total10', 'total11', 'total12', 'total13', 'total14', 'total15', 'total16', 'total17', 'total18', 'total668', 'total669', 'total800', 'total801'],
                'VARIABLE':['F40:V7', 'F40:V8', 'F40:V9', 'F40:V10', 'F40:V11', 'F40:V12', 'F40:V13', 'F40:V14', 'F40:V15', 'F40:V16', 'F40:V17', 'F40:V18', 'F40:V668', 'F40:V669', 'F40:V800', 'F40:V801']}

mapping_labels=pd.DataFrame(mapping_labels)

    
compass_df_Instrument2['isbudgeted7'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V7']
compass_df_Instrument2['isbudgeted8'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V8']
compass_df_Instrument2['isbudgeted9'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V9']
compass_df_Instrument2['isbudgeted10'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V10']
compass_df_Instrument2['isbudgeted11'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V11']
compass_df_Instrument2['isbudgeted12'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V12']
compass_df_Instrument2['isbudgeted13'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V13']
compass_df_Instrument2['isbudgeted14'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V14']
compass_df_Instrument2['isbudgeted15'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V15']
compass_df_Instrument2['isbudgeted16'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V16']
compass_df_Instrument2['isbudgeted17'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V17']
compass_df_Instrument2['isbudgeted18'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V18']
compass_df_Instrument2['isbudgeted668'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V668']
compass_df_Instrument2['isbudgeted800'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V800']
compass_df_Instrument2['isbudgeted801'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V801']
compass_df_Instrument2['isbudgeted669'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V669']

compass_df_Instrument2['total7'] = compass_df_Instrument2['F40:V7']
compass_df_Instrument2['total8'] = compass_df_Instrument2['F40:V8']
compass_df_Instrument2['total9'] = compass_df_Instrument2['F40:V9']
compass_df_Instrument2['total10'] = compass_df_Instrument2['F40:V10']
compass_df_Instrument2['total11'] = compass_df_Instrument2['F40:V11']
compass_df_Instrument2['total12'] = compass_df_Instrument2['F40:V12']
compass_df_Instrument2['total13'] = compass_df_Instrument2['F40:V13']
compass_df_Instrument2['total14'] = compass_df_Instrument2['F40:V14']
compass_df_Instrument2['total15'] = compass_df_Instrument2['F40:V15']
compass_df_Instrument2['total16'] = compass_df_Instrument2['F40:V16']
compass_df_Instrument2['total17'] = compass_df_Instrument2['F40:V17']
compass_df_Instrument2['total18'] = compass_df_Instrument2['F40:V18']
compass_df_Instrument2['total668'] = compass_df_Instrument2['F40:V668']
compass_df_Instrument2['total800'] = compass_df_Instrument2['F40:V800']
compass_df_Instrument2['total801'] = compass_df_Instrument2['F40:V801']
compass_df_Instrument2['total669'] = compass_df_Instrument2['F40:V669']

    # Weight initiatives by NBW
for x in range(6, (compass_df_Instrument2.shape[1])-32):
    compass_df_Instrument2.iloc[:,x] = compass_df_Instrument2.iloc[:,x] * compass_df_Instrument2['NBW']

    # Introduce a variable 'Total' that reflects the proportion of performance-based initiatives to total initiatives
compass_df_Instrument2['Total'] = compass_df_Instrument2['F40:V7'] + compass_df_Instrument2['F40:V8'] + compass_df_Instrument2['F40:V9'] + compass_df_Instrument2['F40:V10'] + compass_df_Instrument2['F40:V11'] + compass_df_Instrument2['F40:V12'] + compass_df_Instrument2['F40:V14'] + compass_df_Instrument2['F40:V13'] + compass_df_Instrument2['F40:V15'] + compass_df_Instrument2['F40:V16'] + compass_df_Instrument2['F40:V17'] + compass_df_Instrument2['F40:V18'] + compass_df_Instrument2['F40:V668'] + compass_df_Instrument2['F40:V800'] + compass_df_Instrument2['F40:V801'] + compass_df_Instrument2['F40:V669']
compass_df_Instrument2 = compass_df_Instrument2.drop_duplicates()
compass_df_Instrument2['Unique'] = 1


    # Reorder the columns to have variables on the right:
compass_df_Instrument2_isbudgeted = compass_df_Instrument2.copy()

    # Aggregate data by countries in two distinct dataframe
compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['Total', 'isbudgeted', 'Unique', 'F40:V7','F40:V8','F40:V9','F40:V10','F40:V11','F40:V12','F40:V13','F40:V14','F40:V15','F40:V16','F40:V17','F40:V18','F40:V668','F40:V800','F40:V801','F40:V669'], aggfunc={'Total': sum, 'Unique': sum, 'isbudgeted': pd.Series.mean, 'F40:V7': sum,'F40:V8': sum,'F40:V9': sum,'F40:V10': sum,'F40:V11': sum,'F40:V12': sum,'F40:V13': sum,'F40:V14': sum,'F40:V15': sum,'F40:V16': sum,'F40:V17': sum,'F40:V18': sum,'F40:V668': sum,'F40:V800': sum,'F40:V801': sum,'F40:V669': sum}, fill_value=0).reset_index()
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2_isbudgeted = pd.pivot_table(data=compass_df_Instrument2_isbudgeted, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['isbudgeted7', 'isbudgeted8', 'isbudgeted9', 'isbudgeted10', 'isbudgeted11', 'isbudgeted12', 'isbudgeted13', 'isbudgeted14', 'isbudgeted15', 'isbudgeted16', 'isbudgeted17', 'isbudgeted18', 'isbudgeted668', 'isbudgeted800', 'isbudgeted801', 'isbudgeted669', 'total7', 'total8', 'total9', 'total10', 'total11', 'total12', 'total13', 'total14', 'total15', 'total16', 'total17', 'total18', 'total668', 'total800', 'total801', 'total669'], aggfunc={'isbudgeted7': sum, 'isbudgeted8': sum, 'isbudgeted9': sum, 'isbudgeted10': sum, 'isbudgeted11': sum, 'isbudgeted12': sum, 'isbudgeted13': sum, 'isbudgeted14': sum, 'isbudgeted15': sum, 'isbudgeted16': sum, 'isbudgeted17': sum, 'isbudgeted18': sum, 'isbudgeted668': sum, 'isbudgeted800': sum, 'isbudgeted801': sum, 'isbudgeted669': sum, 'total7': sum, 'total8': sum, 'total9': sum, 'total10': sum, 'total11': sum, 'total12': sum, 'total13': sum, 'total14': sum, 'total15': sum, 'total16': sum, 'total17': sum, 'total18': sum, 'total668': sum, 'total800': sum, 'total801': sum, 'total669': sum}, fill_value=0).reset_index()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2_isbudgeted['isbudgeted7'] = compass_df_Instrument2_isbudgeted['isbudgeted7'] / compass_df_Instrument2_isbudgeted['total7'] 
compass_df_Instrument2_isbudgeted['isbudgeted8'] = compass_df_Instrument2_isbudgeted['isbudgeted8'] / compass_df_Instrument2_isbudgeted['total8'] 
compass_df_Instrument2_isbudgeted['isbudgeted9'] = compass_df_Instrument2_isbudgeted['isbudgeted9'] / compass_df_Instrument2_isbudgeted['total9'] 
compass_df_Instrument2_isbudgeted['isbudgeted10'] = compass_df_Instrument2_isbudgeted['isbudgeted10'] / compass_df_Instrument2_isbudgeted['total10'] 
compass_df_Instrument2_isbudgeted['isbudgeted11'] = compass_df_Instrument2_isbudgeted['isbudgeted11'] / compass_df_Instrument2_isbudgeted['total11'] 
compass_df_Instrument2_isbudgeted['isbudgeted12'] = compass_df_Instrument2_isbudgeted['isbudgeted12'] / compass_df_Instrument2_isbudgeted['total12'] 
compass_df_Instrument2_isbudgeted['isbudgeted13'] = compass_df_Instrument2_isbudgeted['isbudgeted13'] / compass_df_Instrument2_isbudgeted['total13'] 
compass_df_Instrument2_isbudgeted['isbudgeted14'] = compass_df_Instrument2_isbudgeted['isbudgeted14'] / compass_df_Instrument2_isbudgeted['total14'] 
compass_df_Instrument2_isbudgeted['isbudgeted15'] = compass_df_Instrument2_isbudgeted['isbudgeted15'] / compass_df_Instrument2_isbudgeted['total15'] 
compass_df_Instrument2_isbudgeted['isbudgeted16'] = compass_df_Instrument2_isbudgeted['isbudgeted16'] / compass_df_Instrument2_isbudgeted['total16'] 
compass_df_Instrument2_isbudgeted['isbudgeted17'] = compass_df_Instrument2_isbudgeted['isbudgeted17'] / compass_df_Instrument2_isbudgeted['total17'] 
compass_df_Instrument2_isbudgeted['isbudgeted18'] = compass_df_Instrument2_isbudgeted['isbudgeted18'] / compass_df_Instrument2_isbudgeted['total18'] 
compass_df_Instrument2_isbudgeted['isbudgeted668'] = compass_df_Instrument2_isbudgeted['isbudgeted668'] / compass_df_Instrument2_isbudgeted['total668'] 
compass_df_Instrument2_isbudgeted['isbudgeted669'] = compass_df_Instrument2_isbudgeted['isbudgeted669'] / compass_df_Instrument2_isbudgeted['total669'] 
compass_df_Instrument2_isbudgeted['isbudgeted800'] = compass_df_Instrument2_isbudgeted['isbudgeted800'] / compass_df_Instrument2_isbudgeted['total800'] 
compass_df_Instrument2_isbudgeted['isbudgeted801'] = compass_df_Instrument2_isbudgeted['isbudgeted801'] / compass_df_Instrument2_isbudgeted['total801'] 

# Get rid of intermediate totals (not needed any more)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['total7', 'total8', 'total9', 'total10', 'total11', 'total12', 'total13', 'total14', 'total15', 'total16', 'total17', 'total18', 'total668', 'total669', 'total800', 'total801'], axis=1).copy()
    

# Complement with OECD data:
    # Generate a dataframe filtered on OECD member only:
compass_df_Instrument2_bis = compass_df_Instrument2[compass_df_Instrument2.OECD_STATUS == "Member"]
compass_df_Instrument2_isbudgeted_bis = compass_df_Instrument2_isbudgeted[compass_df_Instrument2_isbudgeted.OECD_STATUS == "Member"]

    # Compute median (doesnt make much sense here)
new_row = {'CountryLabel':'OECD Median', 'OECD_STATUS': "Member", 'CountryCode':'OECD', 'Total': compass_df_Instrument2_bis['Total'].median(), 'Unique': compass_df_Instrument2_bis['Unique'].median(), 'F40:V7': compass_df_Instrument2_bis['F40:V7'].median(), 
           'F40:V8': compass_df_Instrument2_bis['F40:V8'].median(), 'F40:V9': compass_df_Instrument2_bis['F40:V9'].median(), 'F40:V10': compass_df_Instrument2_bis['F40:V10'].median(), 'F40:V11': compass_df_Instrument2_bis['F40:V11'].median(), 'F40:V12': compass_df_Instrument2_bis['F40:V12'].median(), 'F40:V13': compass_df_Instrument2_bis['F40:V13'].median(), 
           'F40:V14': compass_df_Instrument2_bis['F40:V14'].median(), 'F40:V15': compass_df_Instrument2_bis['F40:V15'].median(), 'F40:V16': compass_df_Instrument2_bis['F40:V16'].median(), 'F40:V17': compass_df_Instrument2_bis['F40:V17'].median(), 'F40:V18': compass_df_Instrument2_bis['F40:V18'].median(), 'F40:V668': compass_df_Instrument2_bis['F40:V668'].median(), 'F40:V800': compass_df_Instrument2_bis['F40:V800'].median(), 'F40:V801': compass_df_Instrument2_bis['F40:V801'].median(), 
           'F40:V669': compass_df_Instrument2_bis['F40:V669'].median(), 'isbudgeted': compass_df_Instrument2_bis['isbudgeted'].median()}

new_row_isbudgeted = {'CountryLabel':'OECD Median', 'OECD_STATUS': "Member", 'CountryCode':'OECD', 'isbudgeted7': compass_df_Instrument2_isbudgeted_bis['isbudgeted7'].median(), 'isbudgeted8': compass_df_Instrument2_isbudgeted_bis['isbudgeted8'].median(), 'isbudgeted9': compass_df_Instrument2_isbudgeted_bis['isbudgeted9'].median(), 
           'isbudgeted10': compass_df_Instrument2_isbudgeted_bis['isbudgeted10'].median(), 'isbudgeted11': compass_df_Instrument2_isbudgeted_bis['isbudgeted11'].median(), 'isbudgeted12': compass_df_Instrument2_isbudgeted_bis['isbudgeted12'].median(), 'isbudgeted13': compass_df_Instrument2_isbudgeted_bis['isbudgeted13'].median(), 'isbudgeted14': compass_df_Instrument2_isbudgeted_bis['isbudgeted14'].median(), 'isbudgeted15': compass_df_Instrument2_isbudgeted_bis['isbudgeted15'].median(), 
           'isbudgeted16': compass_df_Instrument2_isbudgeted_bis['isbudgeted16'].median(), 'isbudgeted17': compass_df_Instrument2_isbudgeted_bis['isbudgeted17'].median(), 'isbudgeted18': compass_df_Instrument2_isbudgeted_bis['isbudgeted18'].median(), 'isbudgeted668': compass_df_Instrument2_isbudgeted_bis['isbudgeted668'].median(), 'isbudgeted669': compass_df_Instrument2_isbudgeted_bis['isbudgeted669'].median(), 'isbudgeted800': compass_df_Instrument2_isbudgeted_bis['isbudgeted800'].median(), 'isbudgeted801': compass_df_Instrument2_isbudgeted_bis['isbudgeted801'].median()}


    # Compute mean
new_row2 = {'CountryLabel':'OECD Mean', 'OECD_STATUS': "Member", 'CountryCode':'OECD2', 'Total': compass_df_Instrument2_bis['Total'].mean(), 'Unique': compass_df_Instrument2_bis['Unique'].mean(), 'F40:V7': compass_df_Instrument2_bis['F40:V7'].mean(), 
           'F40:V8': compass_df_Instrument2_bis['F40:V8'].mean(), 'F40:V9': compass_df_Instrument2_bis['F40:V9'].mean(), 'F40:V10': compass_df_Instrument2_bis['F40:V10'].mean(), 'F40:V11': compass_df_Instrument2_bis['F40:V11'].mean(), 'F40:V12': compass_df_Instrument2_bis['F40:V12'].mean(), 'F40:V13': compass_df_Instrument2_bis['F40:V13'].mean(), 
           'F40:V14': compass_df_Instrument2_bis['F40:V14'].mean(), 'F40:V15': compass_df_Instrument2_bis['F40:V15'].mean(), 'F40:V16': compass_df_Instrument2_bis['F40:V16'].mean(), 'F40:V17': compass_df_Instrument2_bis['F40:V17'].mean(), 'F40:V18': compass_df_Instrument2_bis['F40:V18'].mean(), 'F40:V668': compass_df_Instrument2_bis['F40:V668'].mean(), 'F40:V800': compass_df_Instrument2_bis['F40:V800'].mean(), 'F40:V801': compass_df_Instrument2_bis['F40:V801'].mean(), 
           'F40:V669': compass_df_Instrument2_bis['F40:V669'].mean(), 'isbudgeted': compass_df_Instrument2_bis['isbudgeted'].mean()}
new_row2_isbudgeted = {'CountryLabel':'OECD Mean', 'OECD_STATUS': "Member", 'CountryCode':'OECD2', 'isbudgeted7': compass_df_Instrument2_isbudgeted_bis['isbudgeted7'].mean(), 'isbudgeted8': compass_df_Instrument2_isbudgeted_bis['isbudgeted8'].mean(), 'isbudgeted9': compass_df_Instrument2_isbudgeted_bis['isbudgeted9'].mean(), 
           'isbudgeted10': compass_df_Instrument2_isbudgeted_bis['isbudgeted10'].mean(), 'isbudgeted11': compass_df_Instrument2_isbudgeted_bis['isbudgeted11'].mean(), 'isbudgeted12': compass_df_Instrument2_isbudgeted_bis['isbudgeted12'].mean(), 'isbudgeted13': compass_df_Instrument2_isbudgeted_bis['isbudgeted13'].mean(), 'isbudgeted14': compass_df_Instrument2_isbudgeted_bis['isbudgeted14'].mean(), 'isbudgeted15': compass_df_Instrument2_isbudgeted_bis['isbudgeted15'].mean(), 
           'isbudgeted16': compass_df_Instrument2_isbudgeted_bis['isbudgeted16'].mean(), 'isbudgeted17': compass_df_Instrument2_isbudgeted_bis['isbudgeted17'].mean(), 'isbudgeted18': compass_df_Instrument2_isbudgeted_bis['isbudgeted18'].mean(), 'isbudgeted668': compass_df_Instrument2_isbudgeted_bis['isbudgeted668'].mean(), 'isbudgeted669': compass_df_Instrument2_isbudgeted_bis['isbudgeted669'].mean(), 'isbudgeted800': compass_df_Instrument2_isbudgeted_bis['isbudgeted800'].mean(), 'isbudgeted801': compass_df_Instrument2_isbudgeted_bis['isbudgeted801'].mean()}

    # Add new mean/median lines
compass_df_Instrument2 = compass_df_Instrument2.append(new_row2, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(new_row2_isbudgeted, ignore_index=True)


    # Lets redo the total as OECD median total isnt correct
compass_df_Instrument2['Total'] = compass_df_Instrument2['F40:V7'] + compass_df_Instrument2['F40:V8'] + compass_df_Instrument2['F40:V9'] + compass_df_Instrument2['F40:V10'] + compass_df_Instrument2['F40:V11'] + compass_df_Instrument2['F40:V12'] + compass_df_Instrument2['F40:V14'] + compass_df_Instrument2['F40:V13'] + compass_df_Instrument2['F40:V15'] + compass_df_Instrument2['F40:V16'] + compass_df_Instrument2['F40:V17'] + compass_df_Instrument2['F40:V18'] + compass_df_Instrument2['F40:V668'] + compass_df_Instrument2['F40:V800'] + compass_df_Instrument2['F40:V801'] + compass_df_Instrument2['F40:V669']

    # Get rid of OECD_STATUS (not needed any more)
compass_df_Instrument2 = compass_df_Instrument2.drop(['OECD_STATUS'], axis=1).copy()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['OECD_STATUS'], axis=1).copy()


# generate alternative country label with funded initiative / total initiative (for both tabled)
compass_df_Instrument2_Label = compass_df_Instrument2[['CountryLabel', 'Unique', 'isbudgeted']].copy()
compass_df_Instrument2_Label['isbudgeted'] = compass_df_Instrument2_Label['isbudgeted'] * compass_df_Instrument2_Label['Unique']
compass_df_Instrument2_Label['CountryLabel2'] = compass_df_Instrument2_Label['CountryLabel']

    # for table compass_df_Instrument2
for x in range(0, compass_df_Instrument2_Label.shape[0]):
    if compass_df_Instrument2_Label["CountryLabel"].loc[x] == "OECD Mean":
        #compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (round(compass_df_Instrument2_Label['isbudgeted'].loc[x],2).astype(float)).astype(str) + " / " +  (round(compass_df_Instrument2_Label['Unique'].loc[x],2).astype(float)).astype(str) + ")"
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (round(compass_df_Instrument2_Label['Unique'].loc[x],2).astype(float)).astype(str) + ")"
    else:
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (compass_df_Instrument2_Label['Unique'].loc[x].astype(int)).astype(str) + ")"
compass_df_Instrument2_Label = compass_df_Instrument2_Label.drop(['Unique'], axis=1).copy()


# Prepare the dataframe for graph inclusion:

    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:

index_names = compass_df_Instrument2[(compass_df_Instrument2['Unique']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(index_names, inplace = False)

    # Obtain the data in percent instead:
for x in range(2, compass_df_Instrument2.shape[1]-2):
       compass_df_Instrument2.iloc[:,x] = 100* compass_df_Instrument2.iloc[:,x]/ compass_df_Instrument2['Total']


        # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument2.loc[compass_df_Instrument2["CountryCode"].isin(list(my_countries["CountryCode"]))]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.loc[compass_df_Instrument2_isbudgeted["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Remove the fields 'Total', 'Unique' and 'isbudgeted' (not really necessary)
compass_df_Instrument2 = compass_df_Instrument2.drop(['Total', 'Unique', 'isbudgeted'], axis=1).copy()

# Rank the factors:
    # Remove the CountryCode:OECD (to exclude it from TOP computation) and store them in another dataframe (we'll need them again)
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2.iloc[compass_df_Instrument2.shape[0]-1:,:]
index_names = compass_df_Instrument2[(compass_df_Instrument2['CountryCode'] == 'OECD') | (compass_df_Instrument2['CountryCode'] == 'OECD2')].index
compass_df_Instrument2.drop(index_names, inplace = True)
compass_df_Instrument2 = compass_df_Instrument2.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2_filtered_OECD.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['CountryCode'], axis=1).copy()

list_mean_order = {'CountryLabel':'TEST', 'F40:V10': compass_df_Instrument2_bis['F40:V10'].mean(), 'F40:V11': compass_df_Instrument2_bis['F40:V11'].mean(), 'F40:V12': compass_df_Instrument2_bis['F40:V12'].mean(), 'F40:V13': compass_df_Instrument2_bis['F40:V13'].mean(), 
                   'F40:V14': compass_df_Instrument2_bis['F40:V14'].mean(), 'F40:V15': compass_df_Instrument2_bis['F40:V15'].mean(), 'F40:V16': compass_df_Instrument2_bis['F40:V16'].mean(), 'F40:V17': compass_df_Instrument2_bis['F40:V17'].mean(), 'F40:V18': compass_df_Instrument2_bis['F40:V18'].mean(), 'F40:V668': compass_df_Instrument2_bis['F40:V668'].mean(), 
                   'F40:V669': compass_df_Instrument2_bis['F40:V669'].mean(), 'F40:V7': compass_df_Instrument2_bis['F40:V7'].mean(), 'F40:V8': compass_df_Instrument2_bis['F40:V8'].mean(), 'F40:V800': compass_df_Instrument2_bis['F40:V800'].mean(), 'F40:V801': compass_df_Instrument2_bis['F40:V801'].mean(), 'F40:V9': compass_df_Instrument2_bis['F40:V9'].mean()}
list_mean_order_isbudgeted = {'CountryLabel':'TEST', 'isbudgeted10': compass_df_Instrument2_bis['F40:V10'].mean(), 'isbudgeted11': compass_df_Instrument2_bis['F40:V11'].mean(), 'isbudgeted12': compass_df_Instrument2_bis['F40:V12'].mean(), 'isbudgeted13': compass_df_Instrument2_bis['F40:V13'].mean(), 
                   'isbudgeted14': compass_df_Instrument2_bis['F40:V14'].mean(), 'isbudgeted15': compass_df_Instrument2_bis['F40:V15'].mean(), 'isbudgeted16': compass_df_Instrument2_bis['F40:V16'].mean(), 'isbudgeted17': compass_df_Instrument2_bis['F40:V17'].mean(), 'isbudgeted18': compass_df_Instrument2_bis['F40:V18'].mean(), 'isbudgeted668': compass_df_Instrument2_bis['F40:V668'].mean(), 
                   'isbudgeted669': compass_df_Instrument2_bis['F40:V669'].mean(), 'isbudgeted7': compass_df_Instrument2_bis['F40:V7'].mean(), 'isbudgeted8': compass_df_Instrument2_bis['F40:V8'].mean(), 'isbudgeted800': compass_df_Instrument2_bis['F40:V800'].mean(), 'isbudgeted801': compass_df_Instrument2_bis['F40:V801'].mean(), 'isbudgeted9': compass_df_Instrument2_bis['F40:V9'].mean()}

compass_df_Instrument2 = compass_df_Instrument2.append(list_mean_order, ignore_index=True)
compass_df_Instrument2 = compass_df_Instrument2.append(compass_df_Instrument2_filtered_OECD, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(list_mean_order_isbudgeted, ignore_index=True)

# sort both dataframes
    # sort table compass_df_Instrument2_isbudgeted
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.T
new_header = compass_df_Instrument2_isbudgeted.iloc[0]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted[1:]
compass_df_Instrument2_isbudgeted.columns = new_header
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.astype(float)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['TEST'], axis=1).copy()

    # sort table compass_df_Instrument2
compass_df_Instrument2 = compass_df_Instrument2.T
new_header = compass_df_Instrument2.iloc[0]
compass_df_Instrument2 = compass_df_Instrument2[1:]
compass_df_Instrument2.columns = new_header
compass_df_Instrument2 = compass_df_Instrument2.astype(float)
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2 = compass_df_Instrument2.drop(['TEST'], axis=1).copy()

# use the index to create a variable called axislocation
My_position = compass_df_Instrument2.copy()
My_position = My_position.reset_index()
My_position = My_position[["level_0", "index"]]
My_position.rename(columns={'level_0': 'axislocation'}, inplace=True)

# melt the two dataframes
compass_df_Instrument3 = compass_df_Instrument2_isbudgeted.copy()
new_header3 = list(compass_df_Instrument2_isbudgeted.columns[1:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["index"], value_vars=new_header3, var_name='Variable', value_name='Color')

compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[1:].values)
compass_df_Instrument4 = compass_df_Instrument4.melt(id_vars=["index"], value_vars=new_header4, var_name='Variable', value_name='Value')

compass_df_Instrument2 = compass_df_Instrument4.copy()
compass_df_Instrument2["Color"] = compass_df_Instrument3["Color"]

    # add the axislocation to compass_df_Instrument2

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      My_position, 
                      on ='index', 
                      how ='inner')

    # Rename the columns from Dict_Headers name

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')    
My_position = pd.merge(My_position, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument2_Label, 
                        left_on ='Variable', 
                        right_on=['CountryLabel'],
                        how ='inner')
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)

compass_df_Instrument2 = compass_df_Instrument2.drop(['Code', 'CountryLabel', 'Variable', 'isbudgeted'], axis=1).copy()
compass_df_Instrument2.rename(columns={'CountryLabel2': 'CountryLabel'}, inplace=True)


# Generate the graph:

import plotly.graph_objects as go

Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n/100)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 7: Allocation criteria for institutional funding for public research <br> as a share of total budget-weighted instruments reported",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=1000,
    height=750,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Allocation criteria considered',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(-0.5, 15.5),
    ticktext= My_position['Long Description'],
    tickvals=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)

fig.show()

Allocation of institutional funding for public research is made using a wide array of criteria. Research impact is the main criteria on average, followed by research publications and outputs, scientific partnerships and collaborations, and alignments with national research priorities. Some country preferences for certain criteria are apparent in the figure. Canada gears towards alignment with national research priorities and “other” (particularly used to indicate that allocations are determined at the federal level), while Japan is particularly geared towards research impacts. Germany favours research publications and the budgets allocated in previous years as funding criteria. Korea and France use a more diverse set of allocation criteria. Research impact and scientific partnership and collaborations are the most prevalent ones in Japan.

4.2. Project grants for public research¶

This second series of graph looks at project grants for public research. These are direct allocations of funding to individual researchers or public research organisations seeking to finance all or part of a research project. They include grant schemes that vary from simplistic, one-off funding allocations to complex strategic programs built on formal public-private partnerships.

4.2.1. Type of research activity funded¶

Project grants for public research can support different kinds of activities. The STIP Compass dataset distinguishes between applied research, basic research, multidisciplinary research, experimental development and demonstration/testing. Figure 8 maps the activities funded by research grants in the selected five countries and the OECD mean. More than one activity can be targeted by a given grant instruments. The frequency of each activity is aggregated using the NBW values. The resulting frequencies are then divided by the total NBW grant instruments reported by the country, to obtain the relative prevalence of each research activity. For each country, each bubble size is proportional to the corresponding research activity's share in the overall pool of NBW project grants for public research.

# Data preparation:

    # Copy the central dataframe
compass_df_Instrument = compass_df.copy()

    # Filter on initiatives linked to "Institutional funding for public research":
compass_df_Instrument = compass_df_Instrument[(compass_df_Instrument.InstrumentTypeLabel == "Project grants for public research")]


    # Generate a table of initiatives by time-period for each country
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentTypeLabel', 'InstrumentID'], values=['NBW', 'isbudgeted', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805'], aggfunc={'NBW': pd.Series.mean, 'isbudgeted': pd.Series.mean, 'F76:V33': sum, 'F76:V34': sum, 'F76:V35': sum, 'F76:V804': sum, 'F76:V805': sum}, fill_value=0).reset_index()

    # Add an extra column to account for missing timeline:
compass_df_Instrument["F76:V31"] = 0

for x in range(5, compass_df_Instrument.shape[1]-3):
    compass_df_Instrument["F76:V31"] = compass_df_Instrument["F76:V31"] + compass_df_Instrument.iloc[:,x]

for x in range(0, compass_df_Instrument.shape[0]):
    if compass_df_Instrument["F76:V31"].loc[x] != 0:
        compass_df_Instrument["F76:V31"].loc[x]  = 0
    else:
        compass_df_Instrument["F76:V31"].loc[x]  = 1
        # Note: we've added "F76:V31" to the dictionary earlier

    # Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'NBW', 'isbudgeted', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805', 'F76:V31'])

compass_df_Instrument['isbudgeted33'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V33']
compass_df_Instrument['isbudgeted34'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V34']
compass_df_Instrument['isbudgeted35'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V35']
compass_df_Instrument['isbudgeted804'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V804']
compass_df_Instrument['isbudgeted805'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V805']
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V31']
compass_df_Instrument['total33'] = compass_df_Instrument['F76:V33']
compass_df_Instrument['total34'] = compass_df_Instrument['F76:V34']
compass_df_Instrument['total35'] = compass_df_Instrument['F76:V35']
compass_df_Instrument['total804'] = compass_df_Instrument['F76:V804']
compass_df_Instrument['total805'] = compass_df_Instrument['F76:V805']
compass_df_Instrument['total31'] = compass_df_Instrument['F76:V31']

# Weight each data by NBW
for x in range(6, compass_df_Instrument.shape[1]-12):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] * compass_df_Instrument['NBW']

# Aggregate the data by countries
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['isbudgeted', 'InstrumentID', 'isbudgeted33', 'isbudgeted34', 'isbudgeted35', 'isbudgeted804', 'isbudgeted805', 'isbudgeted31', 'total33', 'total34', 'total35', 'total804', 'total805','total31','F76:V33','F76:V34', 'F76:V35','F76:V804','F76:V805','F76:V31'], aggfunc={'isbudgeted': sum, 'InstrumentID':pd.Series.nunique, 'isbudgeted33': sum, 'isbudgeted34': sum, 'isbudgeted35': sum, 'isbudgeted804': sum, 'isbudgeted805': sum, 'isbudgeted31': sum, 'total33': sum, 'total34': sum, 'total35': sum, 'total804': sum, 'total805': sum,'total31': sum,'F76:V33': sum,'F76:V34': sum, 'F76:V35': sum,'F76:V804': sum,'F76:V805': sum,'F76:V31': sum}, margins=True, margins_name='total2', fill_value=0).reset_index()

index_names = compass_df_Instrument[(compass_df_Instrument['OECD_STATUS'] == '')].index
compass_df_Instrument.drop(index_names, inplace = True)
       
compass_df_Instrument['isbudgeted33'] = compass_df_Instrument['isbudgeted33'] / compass_df_Instrument['total33'] 
compass_df_Instrument['isbudgeted34'] = compass_df_Instrument['isbudgeted34'] / compass_df_Instrument['total34'] 
compass_df_Instrument['isbudgeted35'] = compass_df_Instrument['isbudgeted35'] / compass_df_Instrument['total35'] 
compass_df_Instrument['isbudgeted804'] = compass_df_Instrument['isbudgeted804'] / compass_df_Instrument['total804'] 
compass_df_Instrument['isbudgeted805'] = compass_df_Instrument['isbudgeted805'] / compass_df_Instrument['total805'] 
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted31'] / compass_df_Instrument['total31'] 


# Get rid of intermediate totals (not needed any more)
compass_df_Instrument = compass_df_Instrument.drop(['total33', 'total34', 'total35', 'total804', 'total805', 'total31'], axis=1).copy()

# Generate a dataframe filtered on OECD member only:
compass_df_Instrument_bis = compass_df_Instrument[compass_df_Instrument.OECD_STATUS == "Member"]
new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'F76:V31': round(compass_df_Instrument_bis['F76:V31'].mean(),1), 'F76:V33': round(compass_df_Instrument_bis['F76:V33'].mean(),1), 'F76:V34': round(compass_df_Instrument_bis['F76:V34'].mean(),1), 'F76:V35': round(compass_df_Instrument_bis['F76:V35'].mean(),1), 'F76:V804': compass_df_Instrument_bis['F76:V804'].mean(), 'F76:V805': compass_df_Instrument_bis['F76:V805'].mean(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].mean(), 'isbudgeted33': compass_df_Instrument_bis['isbudgeted33'].mean(), 'isbudgeted34': compass_df_Instrument_bis['isbudgeted34'].mean(), 'isbudgeted35': compass_df_Instrument_bis['isbudgeted35'].mean(), 'isbudgeted804': compass_df_Instrument_bis['isbudgeted804'].mean(), 'isbudgeted805': compass_df_Instrument_bis['isbudgeted805'].mean(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].mean()}
new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD', 'F76:V31': round(compass_df_Instrument_bis['F76:V31'].median(),1), 'F76:V33': round(compass_df_Instrument_bis['F76:V33'].median(),1), 'F76:V34': round(compass_df_Instrument_bis['F76:V34'].median(),1), 'F76:V35': round(compass_df_Instrument_bis['F76:V35'].median(),1), 'F76:V804': compass_df_Instrument_bis['F76:V804'].median(), 'F76:V805': compass_df_Instrument_bis['F76:V805'].median(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].median(), 'isbudgeted33': compass_df_Instrument_bis['isbudgeted33'].median(), 'isbudgeted34': compass_df_Instrument_bis['isbudgeted34'].median(), 'isbudgeted35': compass_df_Instrument_bis['isbudgeted35'].median(), 'isbudgeted804': compass_df_Instrument_bis['isbudgeted804'].median(), 'isbudgeted805': compass_df_Instrument_bis['isbudgeted805'].median(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].median()}

# Append the OECD data to the dataframes
compass_df_Instrument = compass_df_Instrument.append(new_row, ignore_index=True)


# Generate new labels dataframe
compass_df_Instrument_Label = compass_df_Instrument[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']


compass_df_Instrument_Label['isbudgeted'] = compass_df_Instrument_Label['InstrumentID'] * compass_df_Instrument_Label['isbudgeted']

for x in range(0, compass_df_Instrument_Label.shape[0]):
    if compass_df_Instrument_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (round(compass_df_Instrument_Label['InstrumentID'].loc[x],1).astype(float)).astype(str) + ") "
    else:
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " ("  + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ") "
           
# Prepare the dataframe for graph inclusion:
    # Reorder the columns to have variables on the right:
compass_df_Instrument2 = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted33', 'isbudgeted34', 'isbudgeted31', 'isbudgeted35', 'isbudgeted804', 'isbudgeted805','F76:V31', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805'])

    # generate alternative country label with funded initiative / total initiative (for both tabled)
compass_df_Instrument2_Label = compass_df_Instrument2[['CountryLabel', 'InstrumentID', 'isbudgeted']].copy()
compass_df_Instrument2_Label['isbudgeted'] = compass_df_Instrument2_Label['isbudgeted']
compass_df_Instrument2_Label['CountryLabel2'] = compass_df_Instrument2_Label['CountryLabel']

    # for table compass_df_Instrument2
for x in range(0, compass_df_Instrument2_Label.shape[0]):
    if compass_df_Instrument2_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (round(compass_df_Instrument2_Label['isbudgeted'].loc[x],2).astype(float)).astype(str) + " / " +  (round(compass_df_Instrument2_Label['InstrumentID'].loc[x],2).astype(float)).astype(str) + ")"
    else:
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (compass_df_Instrument2_Label['isbudgeted'].loc[x].astype(int)).astype(str) + " / " +  (compass_df_Instrument2_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ")"

# The dataframe obtained shows counts of initiatives - Compute the proportion instead
compass_df_Instrument2["F76:V32"] = 0
for x in range(11, compass_df_Instrument2.shape[1]-1):
    compass_df_Instrument2["F76:V32"] = compass_df_Instrument2["F76:V32"] + compass_df_Instrument2.iloc[:,x]
for x in range(11, compass_df_Instrument2.shape[1]):
    compass_df_Instrument2.iloc[:,x] = compass_df_Instrument2.iloc[:,x] / compass_df_Instrument2["F76:V32"] #compass_df_Instrument.iloc[:,compass_df_Instrument.shape[1]]   

# Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument2.loc[compass_df_Instrument["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Trim the dataframe to only core data and transpose it
compass_df_Instrument2 = compass_df_Instrument2[['CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted31', 'isbudgeted33', 'isbudgeted34', 'isbudgeted35', 'isbudgeted804', 'isbudgeted805','F76:V31', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805']].copy()

    # rename the InstrumentID to total
compass_df_Instrument2 = compass_df_Instrument2.rename(columns={"InstrumentID":"Total"})

    # Aggregate data by countries in two distinct dataframe
compass_df_Instrument2_isbudgeted = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['isbudgeted31', 'isbudgeted33', 'isbudgeted34', 'isbudgeted35', 'isbudgeted804', 'isbudgeted805'], aggfunc={'isbudgeted31': sum, 'isbudgeted33': sum, 'isbudgeted34': sum, 'isbudgeted35': sum, 'isbudgeted804': sum, 'isbudgeted805': sum}, fill_value=0).reset_index()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['Total', 'isbudgeted', 'F76:V31', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805'], aggfunc={'Total': sum, 'isbudgeted': pd.Series.mean, 'F76:V31': sum, 'F76:V33': sum, 'F76:V34': sum, 'F76:V35': sum, 'F76:V804': sum, 'F76:V805': sum}, fill_value=0).reset_index()
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='CountryLabel', ascending=True)
compass_df_Instrument2['Total'] = compass_df_Instrument2['F76:V31'] + compass_df_Instrument2['F76:V33'] + compass_df_Instrument2['F76:V34'] + compass_df_Instrument2['F76:V35'] + compass_df_Instrument2['F76:V804'] + compass_df_Instrument2['F76:V805']

# Prepare the dataframe for graph inclusion:

    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:

index_names = compass_df_Instrument2[(compass_df_Instrument2['isbudgeted']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(index_names, inplace = False)

    # Obtain the data in percent instead:
for x in range(2, compass_df_Instrument2.shape[1]-2):
       compass_df_Instrument2.iloc[:,x] = 100* compass_df_Instrument2.iloc[:,x]/ compass_df_Instrument2['Total']


# Rank the factors:
    # Remove the CountryCode:OECD (to exclude it from TOP computation) and store them in another dataframe (we'll need them again)
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2.iloc[compass_df_Instrument2.shape[0]-1:,:]
index_names = compass_df_Instrument2[(compass_df_Instrument2['CountryCode'] == 'OECD') | (compass_df_Instrument2['CountryCode'] == 'OECD2')].index
compass_df_Instrument2.drop(index_names, inplace = True)

list_mean_order = {'CountryLabel':'TEST', 'F76:V31': compass_df_Instrument2['F76:V31'].mean(), 'F76:V33': compass_df_Instrument2['F76:V33'].mean(), 'F76:V34': compass_df_Instrument2['F76:V34'].mean(), 'F76:V35': compass_df_Instrument2['F76:V35'].mean(), 
                   'F76:V804': compass_df_Instrument2['F76:V804'].mean(), 'F76:V805': compass_df_Instrument2['F76:V805'].mean()}

list_mean_order_isbudgeted = {'CountryLabel':'TEST', 'isbudgeted31': compass_df_Instrument2['F76:V31'].mean(), 'isbudgeted33': compass_df_Instrument2['F76:V33'].mean(), 'isbudgeted34': compass_df_Instrument2['F76:V34'].mean(), 'isbudgeted35': compass_df_Instrument2['F76:V35'].mean(), 
                   'isbudgeted804': compass_df_Instrument2['F76:V804'].mean(), 'isbudgeted805': compass_df_Instrument2['F76:V805'].mean()}

compass_df_Instrument2 = compass_df_Instrument2.append(list_mean_order, ignore_index=True)
compass_df_Instrument2 = compass_df_Instrument2.append(compass_df_Instrument2_filtered_OECD, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(list_mean_order_isbudgeted, ignore_index=True)

compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2 = compass_df_Instrument2.drop(['CountryCode', 'isbudgeted', 'Total'], axis=1).copy()

# sort both dataframes
    # sort table compass_df_Instrument2_isbudgeted
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.T
new_header = compass_df_Instrument2_isbudgeted.iloc[0]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted[1:]
compass_df_Instrument2_isbudgeted.columns = new_header
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.astype(float)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['TEST'], axis=1).copy()

    # sort table compass_df_Instrument2
compass_df_Instrument2 = compass_df_Instrument2.T
new_header = compass_df_Instrument2.iloc[0]
compass_df_Instrument2 = compass_df_Instrument2[1:]
compass_df_Instrument2.columns = new_header
compass_df_Instrument2 = compass_df_Instrument2.astype(float)
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2 = compass_df_Instrument2.drop(['TEST'], axis=1).copy()

# use the index to create a variable called axislocation
My_position = compass_df_Instrument2.copy()
My_position = My_position.reset_index()
My_position = My_position[["level_0", "index"]]
My_position.rename(columns={'level_0': 'axislocation'}, inplace=True)


# melt the two dataframes
compass_df_Instrument3 = compass_df_Instrument2_isbudgeted.copy()
new_header3 = list(compass_df_Instrument2_isbudgeted.columns[1:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["index"], value_vars=new_header3, var_name='Variable', value_name='Color')

compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[1:].values)
compass_df_Instrument4 = compass_df_Instrument4.melt(id_vars=["index"], value_vars=new_header4, var_name='Variable', value_name='Value')

compass_df_Instrument2 = compass_df_Instrument4.copy()
compass_df_Instrument2["Color"] = compass_df_Instrument3["Color"]

    # add the axislocation to compass_df_Instrument2

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      My_position, 
                      on ='index', 
                      how ='inner')

    # Rename the columns from Dict_Headers name

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')    
My_position = pd.merge(My_position, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument_Label, 
                        left_on ='Variable', 
                        right_on=['CountryLabel'],
                        how ='inner')
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)

compass_df_Instrument2 = compass_df_Instrument2.drop(['Code', 'CountryLabel', 'Variable', 'isbudgeted'], axis=1).copy()
compass_df_Instrument2.rename(columns={'CountryLabel2': 'CountryLabel'}, inplace=True)



# Generate the graph:

import plotly.graph_objects as go
compass_df_Instrument2["Value"] = compass_df_Instrument2["Value"]/100
Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 8: Type of research activity funded by project grants for public research  <br> as a share of total budget-weighted instruments reported",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=550,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Type of research activity funded',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(min(My_position["axislocation"])-1, max(My_position["axislocation"]) +1 ),
    ticktext=My_position['Long Description'], 
    tickvals=My_position["axislocation"],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
fig.show()

In the OECD area, applied research is the main funded activity, followed closely by basic research. Four out of five of the selected countries follow this same pattern, with the exception of France, where basic research is relatively more funded than applied research. The prevalence of basic research funding seems lower in Japan compared to others. However, this should be interpreted carefully as Korea and Japan reported fewer instruments compared to the OECD mean. Multidisciplinary research stands out as a prevalent activity in Germany in relation to those of other countries. Experimental development and demonstration/testing are cited less often.

4.2.2. Forms of collaboration¶

Figure 9 focuses on the types of collaborations required to allocate grants for public research. These grants at times require the participation of industry partners, international partners, users of research outputs, or other public research actors. . The frequency of each form of collaboration is aggregated using the NBW values. The resulting frequencies are then divided by the total NBW grant instruments reported by the country, to obtain the relative prevalence of each type of collaboration. For each country, each bubble size is proportional to the corresponding collaboration's share in the overall pool of NBW project grants for public research.

# Data preparation:

    # Copy the central dataframe
compass_df_Instrument = compass_df.copy()

    # Filter on initiatives linked to "Institutional funding for public research":
compass_df_Instrument = compass_df_Instrument[(compass_df_Instrument.InstrumentTypeLabel == "Project grants for public research")]

#Aggregate data by InstrumentID
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentID'], values=['NBW', 'isbudgeted', 'F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V806', 'F77:V41'], aggfunc={'NBW': sum, 'isbudgeted': pd.Series.mean, 'F77:V37':sum, 'F77:V38':sum, 'F77:V39':sum, 'F77:V40':sum, 'F77:V806':sum, 'F77:V41': sum}, fill_value=0).reset_index()

    # Add an extra column to account for missing values:
compass_df_Instrument["F77:V31"] = 0

for x in range(5, compass_df_Instrument.shape[1]-3):
    compass_df_Instrument["F77:V31"] = compass_df_Instrument["F77:V31"] + compass_df_Instrument.iloc[:,x]

for x in range(0, compass_df_Instrument.shape[0]):
    if compass_df_Instrument["F77:V31"].loc[x] != 0:
        compass_df_Instrument["F77:V31"].loc[x]  = 0
    else:
        compass_df_Instrument["F77:V31"].loc[x]  = 1
        # Note: we've added "F77:V31" to the dictionary earlier

    # Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'NBW', 'isbudgeted', 'F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V41', 'F77:V806'])
compass_df_Instrument['isbudgeted37'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V37']
compass_df_Instrument['isbudgeted38'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V38']
compass_df_Instrument['isbudgeted39'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V39']
compass_df_Instrument['isbudgeted40'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V40']
compass_df_Instrument['isbudgeted41'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V41']
compass_df_Instrument['isbudgeted806'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V806']
compass_df_Instrument['total37'] = compass_df_Instrument['F77:V37']
compass_df_Instrument['total38'] = compass_df_Instrument['F77:V38']
compass_df_Instrument['total39'] = compass_df_Instrument['F77:V39']
compass_df_Instrument['total40'] = compass_df_Instrument['F77:V40']
compass_df_Instrument['total41'] = compass_df_Instrument['F77:V41']
compass_df_Instrument['total806'] = compass_df_Instrument['F77:V806']

# Weight each data by NBW
for x in range(6, compass_df_Instrument.shape[1]-12):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] * compass_df_Instrument['NBW']

# Aggregate the data by countries
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['isbudgeted', 'InstrumentID', 'isbudgeted37', 'isbudgeted38', 'isbudgeted39', 'isbudgeted40', 'isbudgeted41', 'isbudgeted806', 'F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V41', 'F77:V806', 'total37', 'total38', 'total39', 'total40', 'total41', 'total806'], aggfunc={'isbudgeted': sum, 'InstrumentID':pd.Series.nunique, 'isbudgeted37': sum, 'isbudgeted38': sum, 'isbudgeted39': sum, 'isbudgeted40': sum, 'isbudgeted41': sum, 'isbudgeted806': sum, 'F77:V37': sum, 'F77:V38': sum, 'F77:V39': sum, 'F77:V40': sum, 'F77:V41': sum, 'F77:V806': sum, 'total37': sum, 'total38': sum, 'total39': sum, 'total40': sum, 'total41': sum, 'total806': sum}, margins=True, margins_name='total2', fill_value=0).reset_index()
index_names = compass_df_Instrument[(compass_df_Instrument['OECD_STATUS'] == '')].index
compass_df_Instrument.drop(index_names, inplace = True)

compass_df_Instrument['isbudgeted37'] = compass_df_Instrument['isbudgeted37'] / compass_df_Instrument['total37'] 
compass_df_Instrument['isbudgeted38'] = compass_df_Instrument['isbudgeted38'] / compass_df_Instrument['total38'] 
compass_df_Instrument['isbudgeted39'] = compass_df_Instrument['isbudgeted39'] / compass_df_Instrument['total39'] 
compass_df_Instrument['isbudgeted40'] = compass_df_Instrument['isbudgeted40'] / compass_df_Instrument['total40'] 
compass_df_Instrument['isbudgeted41'] = compass_df_Instrument['isbudgeted41'] / compass_df_Instrument['total41'] 
compass_df_Instrument['isbudgeted806'] = compass_df_Instrument['isbudgeted806'] / compass_df_Instrument['total806'] 

# Get rid of intermediate totals (not needed any more)
compass_df_Instrument = compass_df_Instrument.drop(['total37', 'total38', 'total39', 'total40', 'total41', 'total806'], axis=1).copy()

# Generate a dataframe filtered on OECD member only:
compass_df_Instrument_bis = compass_df_Instrument[compass_df_Instrument.OECD_STATUS == "Member"]
new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'F77:V37': round(compass_df_Instrument_bis['F77:V37'].mean(),1), 'F77:V38': round(compass_df_Instrument_bis['F77:V38'].mean(),1), 'F77:V39': round(compass_df_Instrument_bis['F77:V39'].mean(),1), 'F77:V40': round(compass_df_Instrument_bis['F77:V40'].mean(),1), 'F77:V41': compass_df_Instrument_bis['F77:V41'].mean(), 'F77:V806': compass_df_Instrument_bis['F77:V806'].mean(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted37': compass_df_Instrument_bis['isbudgeted37'].mean(), 'isbudgeted38': compass_df_Instrument_bis['isbudgeted38'].mean(), 'isbudgeted39': compass_df_Instrument_bis['isbudgeted39'].mean(), 'isbudgeted40': compass_df_Instrument_bis['isbudgeted40'].mean(), 'isbudgeted41': compass_df_Instrument_bis['isbudgeted41'].mean(), 'isbudgeted806': compass_df_Instrument_bis['isbudgeted806'].mean(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].mean()}
new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD', 'F77:V37': round(compass_df_Instrument_bis['F77:V37'].median(),1), 'F77:V38': round(compass_df_Instrument_bis['F77:V38'].median(),1), 'F77:V39': round(compass_df_Instrument_bis['F77:V39'].median(),1), 'F77:V40': round(compass_df_Instrument_bis['F77:V40'].median(),1), 'F77:V41': compass_df_Instrument_bis['F77:V41'].median(), 'F77:V806': compass_df_Instrument_bis['F77:V806'].median(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted37': compass_df_Instrument_bis['isbudgeted37'].median(), 'isbudgeted38': compass_df_Instrument_bis['isbudgeted38'].median(), 'isbudgeted39': compass_df_Instrument_bis['isbudgeted39'].median(), 'isbudgeted40': compass_df_Instrument_bis['isbudgeted40'].median(), 'isbudgeted41': compass_df_Instrument_bis['isbudgeted41'].median(), 'isbudgeted806': compass_df_Instrument_bis['isbudgeted806'].median(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].median()}

# Append the OECD data to the dataframes
compass_df_Instrument = compass_df_Instrument.append(new_row, ignore_index=True)

# Generate new labels dataframe
compass_df_Instrument_Label = compass_df_Instrument[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']
compass_df_Instrument_Label['isbudgeted'] = compass_df_Instrument_Label['InstrumentID'] * compass_df_Instrument_Label['isbudgeted']

for x in range(0, compass_df_Instrument_Label.shape[0]):
    if compass_df_Instrument_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (round(compass_df_Instrument_Label['InstrumentID'].loc[x],1).astype(float)).astype(str) + ") "
    else:
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " ("  + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ") "     

# Prepare the dataframe for graph inclusion:
    # Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted37', 'isbudgeted38', 'isbudgeted39', 'isbudgeted40', 'isbudgeted41', 'isbudgeted806','F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V41', 'F77:V806'])

    # generate alternative country label with funded initiative / total initiative (for both tabled)
compass_df_Instrument2_Label = compass_df_Instrument[['CountryLabel', 'InstrumentID', 'isbudgeted']].copy()
compass_df_Instrument2_Label['isbudgeted'] = compass_df_Instrument2_Label['isbudgeted']
compass_df_Instrument2_Label['CountryLabel2'] = compass_df_Instrument2_Label['CountryLabel']
for x in range(0, compass_df_Instrument2_Label.shape[0]):
    if compass_df_Instrument2_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (round(compass_df_Instrument2_Label['isbudgeted'].loc[x],2).astype(float)).astype(str) + " / " +  (round(compass_df_Instrument2_Label['InstrumentID'].loc[x],2).astype(float)).astype(str) + ")"
    else:
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (compass_df_Instrument2_Label['isbudgeted'].loc[x].astype(int)).astype(str) + " / " +  (compass_df_Instrument2_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ")"

    # The dataframe obtained shows counts of initiatives for each time horizon - Compute the proportion instead
compass_df_Instrument["F77:V32"] = 0
for x in range(11, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument["F77:V32"] = compass_df_Instrument["F77:V32"] + compass_df_Instrument.iloc[:,x]
for x in range(11, compass_df_Instrument.shape[1]):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] / compass_df_Instrument["F77:V32"] #compass_df_Instrument.iloc[:,compass_df_Instrument.shape[1]]

    # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument.loc[compass_df_Instrument["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Trim the dataframe to only core data
compass_df_Instrument2 = compass_df_Instrument2[['CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted37', 'isbudgeted38', 'isbudgeted39', 'isbudgeted40', 'isbudgeted41', 'isbudgeted806','F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V41', 'F77:V806']].copy()

    # rename the InstrumentID to total
compass_df_Instrument2 = compass_df_Instrument2.rename(columns={"InstrumentID":"Total"})

    # Aggregate data by countries in two distinct dataframe
compass_df_Instrument2_isbudgeted = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['isbudgeted37', 'isbudgeted38', 'isbudgeted39', 'isbudgeted40', 'isbudgeted41', 'isbudgeted806'], aggfunc={'isbudgeted37': sum, 'isbudgeted38': sum, 'isbudgeted39': sum, 'isbudgeted40': sum, 'isbudgeted41': sum, 'isbudgeted806': sum}, fill_value=0).reset_index()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['Total', 'isbudgeted', 'F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V41', 'F77:V806'], aggfunc={'Total': sum, 'isbudgeted': pd.Series.mean, 'F77:V37': sum, 'F77:V38': sum, 'F77:V39': sum, 'F77:V40': sum, 'F77:V41': sum, 'F77:V806': sum}, fill_value=0).reset_index()
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='CountryLabel', ascending=True)
compass_df_Instrument2['Total'] = compass_df_Instrument2['F77:V37'] + compass_df_Instrument2['F77:V38'] + compass_df_Instrument2['F77:V39'] + compass_df_Instrument2['F77:V40'] + compass_df_Instrument2['F77:V41'] + compass_df_Instrument2['F77:V806']

# Prepare the dataframe for graph inclusion:

    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:

index_names = compass_df_Instrument2[(compass_df_Instrument2['isbudgeted']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(index_names, inplace = False)

    # Obtain the data in percent instead:
for x in range(2, compass_df_Instrument2.shape[1]-2):
       compass_df_Instrument2.iloc[:,x] = 100* compass_df_Instrument2.iloc[:,x]/ compass_df_Instrument2['Total']

# Rank the factors:
    # Remove the CountryCode:OECD (to exclude it from TOP computation) and store them in another dataframe (we'll need them again)
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2.iloc[compass_df_Instrument2.shape[0]-1:,:]
index_names = compass_df_Instrument2[(compass_df_Instrument2['CountryCode'] == 'OECD') | (compass_df_Instrument2['CountryCode'] == 'OECD2')].index
compass_df_Instrument2.drop(index_names, inplace = True)

list_mean_order = {'CountryLabel':'TEST', 'F77:V37': compass_df_Instrument2['F77:V37'].mean(), 'F77:V38': compass_df_Instrument2['F77:V38'].mean(), 'F77:V39': compass_df_Instrument2['F77:V39'].mean(), 'F77:V40': compass_df_Instrument2['F77:V40'].mean(), 
                   'F77:V41': compass_df_Instrument2['F77:V41'].mean(), 'F77:V806': compass_df_Instrument2['F77:V806'].mean()}

list_mean_order_isbudgeted = {'CountryLabel':'TEST', 'isbudgeted37': compass_df_Instrument2['F77:V37'].mean(), 'isbudgeted38': compass_df_Instrument2['F77:V38'].mean(), 'isbudgeted39': compass_df_Instrument2['F77:V39'].mean(), 'isbudgeted40': compass_df_Instrument2['F77:V40'].mean(), 
                   'isbudgeted41': compass_df_Instrument2['F77:V41'].mean(), 'isbudgeted806': compass_df_Instrument2['F77:V806'].mean()}

compass_df_Instrument2 = compass_df_Instrument2.append(list_mean_order, ignore_index=True)
compass_df_Instrument2 = compass_df_Instrument2.append(compass_df_Instrument2_filtered_OECD, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(list_mean_order_isbudgeted, ignore_index=True)

compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2 = compass_df_Instrument2.drop(['CountryCode', 'isbudgeted', 'Total'], axis=1).copy()

# sort both dataframes
    # sort table compass_df_Instrument2_isbudgeted
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.T
new_header = compass_df_Instrument2_isbudgeted.iloc[0]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted[1:]
compass_df_Instrument2_isbudgeted.columns = new_header
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.astype(float)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['TEST'], axis=1).copy()

    # sort table compass_df_Instrument2
compass_df_Instrument2 = compass_df_Instrument2.T
new_header = compass_df_Instrument2.iloc[0]
compass_df_Instrument2 = compass_df_Instrument2[1:]
compass_df_Instrument2.columns = new_header
compass_df_Instrument2 = compass_df_Instrument2.astype(float)
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2 = compass_df_Instrument2.drop(['TEST'], axis=1).copy()

# use the index to create a variable called axislocation
My_position = compass_df_Instrument2.copy()
My_position = My_position.reset_index()
My_position = My_position[["level_0", "index"]]
My_position.rename(columns={'level_0': 'axislocation'}, inplace=True)


# melt the two dataframes
compass_df_Instrument3 = compass_df_Instrument2_isbudgeted.copy()
new_header3 = list(compass_df_Instrument2_isbudgeted.columns[1:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["index"], value_vars=new_header3, var_name='Variable', value_name='Color')

compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[1:].values)
compass_df_Instrument4 = compass_df_Instrument4.melt(id_vars=["index"], value_vars=new_header4, var_name='Variable', value_name='Value')

compass_df_Instrument2 = compass_df_Instrument4.copy()
compass_df_Instrument2["Color"] = compass_df_Instrument3["Color"]

    # add the axislocation to compass_df_Instrument2

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      My_position, 
                      on ='index', 
                      how ='inner')

    # Rename the columns from Dict_Headers name

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')    
My_position = pd.merge(My_position, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument_Label, 
                        left_on ='Variable', 
                        right_on=['CountryLabel'],
                        how ='inner')
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)
compass_df_Instrument2['Value'] = compass_df_Instrument2['Value']/100

compass_df_Instrument2 = compass_df_Instrument2.drop(['Code', 'CountryLabel', 'Variable', 'isbudgeted'], axis=1).copy()
compass_df_Instrument2.rename(columns={'CountryLabel2': 'CountryLabel'}, inplace=True)



# Generate the graph:

import plotly.graph_objects as go

Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 9: Type of research activity funded by project grants for public research  <br> as a share of total budget-weighted instruments reported",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=550,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Type of research activity funded',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(min(My_position["axislocation"])-1, max(My_position["axislocation"]) +1 ),
    ticktext=My_position['Long Description'],
    tickvals=My_position["axislocation"],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
fig.show()

Collaboration is a common requirement in project grants for public research across countries, both for the selected five countries and the OECD mean. The different forms of collaborations tend to be balanced in the database, with the exceptions of users of research output and with other partners, which are relatively more marginal. As a requirement, collaboration with international partners is relatively more prevalent in France and Japan. Compared to other countries, project grants without any collaboration requirement are most prevalent in Canada.

4.2.3. Selection criteria¶

Figure 10 displays the prevalence of selection criteria for the allocation of research grants. These grants are allocated on a competitive basis, selecting from a number of candidate projects on the basis of certain favoured parameters (e.g. societal impact anticipated and alignment with national research priorities. Using the same approach as above, the chart aggregates the policy NBW instrument data reported by the five selected countries to indicate the prevalence of each criterion among the total pool of research grants reported.

# Data preparation:

    # Copy the central dataframe
compass_df_Instrument = compass_df.copy()

# Filter on initiatives linked to "Institutional funding for public research"
compass_df_Instrument = compass_df_Instrument[(compass_df_Instrument.InstrumentTypeLabel == "Project grants for public research")]

    # Aggregate data by InstrumentID
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentID'], values=['NBW', 'isbudgeted', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50'], aggfunc={'NBW': pd.Series.mean, 'isbudgeted': pd.Series.mean, 'F78:V43': sum, 'F78:V44': sum, 'F78:V45': sum, 'F78:V46': sum, 'F78:V807': sum, 'F78:V808': sum, 'F78:V47': sum, 'F78:V48': sum, 'F78:V49': sum, 'F78:V50': sum}, margins=True, margins_name='total', fill_value=0).reset_index()

    # Add an extra column to account for missing values:
compass_df_Instrument["F78:V31"] = 0

for x in range(5, compass_df_Instrument.shape[1]-3):
    compass_df_Instrument["F78:V31"] = compass_df_Instrument["F78:V31"] + compass_df_Instrument.iloc[:,x]

for x in range(0, compass_df_Instrument.shape[0]):
    if compass_df_Instrument["F78:V31"].loc[x] != 0:
        compass_df_Instrument["F78:V31"].loc[x]  = 0
    else:
        compass_df_Instrument["F78:V31"].loc[x]  = 1


    # introduce 2 sets of variables, "isbudgetedXX" and "totalXX"
compass_df_Instrument['isbudgeted43'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V43']
compass_df_Instrument['isbudgeted44'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V44']
compass_df_Instrument['isbudgeted45'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V45']
compass_df_Instrument['isbudgeted46'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V46']
compass_df_Instrument['isbudgeted807'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V807']
compass_df_Instrument['isbudgeted808'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V808']
compass_df_Instrument['isbudgeted47'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V47']
compass_df_Instrument['isbudgeted48'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V48']
compass_df_Instrument['isbudgeted49'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V49']
compass_df_Instrument['isbudgeted50'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V50']
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V31']
compass_df_Instrument['total43'] = compass_df_Instrument['F78:V43']
compass_df_Instrument['total44'] = compass_df_Instrument['F78:V44']
compass_df_Instrument['total45'] = compass_df_Instrument['F78:V45']
compass_df_Instrument['total46'] = compass_df_Instrument['F78:V46']
compass_df_Instrument['total807'] = compass_df_Instrument['F78:V807']
compass_df_Instrument['total808'] = compass_df_Instrument['F78:V808']
compass_df_Instrument['total47'] = compass_df_Instrument['F78:V47']
compass_df_Instrument['total48'] = compass_df_Instrument['F78:V48']
compass_df_Instrument['total49'] = compass_df_Instrument['F78:V49']
compass_df_Instrument['total50'] = compass_df_Instrument['F78:V50']
compass_df_Instrument['total31'] = compass_df_Instrument['F78:V31']

    # Weight initiatives by NBW
for x in range(4, (compass_df_Instrument.shape[1])-25):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] * compass_df_Instrument['NBW']

# Aggregate the data by countries
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['isbudgeted', 'InstrumentID', 'isbudgeted43', 'isbudgeted44', 'isbudgeted45', 'isbudgeted46', 'isbudgeted807', 'isbudgeted808', 'isbudgeted47', 'isbudgeted48', 'isbudgeted49', 'isbudgeted50', 'isbudgeted31', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50', 'F78:V31', 'total43', 'total44', 'total45', 'total46', 'total807', 'total808', 'total47', 'total48', 'total49', 'total50', 'total31'], aggfunc={'isbudgeted': sum, 'InstrumentID':pd.Series.nunique, 'isbudgeted43': sum, 'isbudgeted44': sum, 'isbudgeted45': sum, 'isbudgeted46': sum, 'isbudgeted807': sum, 'isbudgeted808': sum, 'isbudgeted47': sum, 'isbudgeted48': sum, 'isbudgeted49': sum, 'isbudgeted50': sum, 'isbudgeted31': sum, 'F78:V43': sum, 'F78:V44': sum, 'F78:V45': sum, 'F78:V46': sum, 'F78:V807': sum, 'F78:V808': sum, 'F78:V47': sum, 'F78:V48': sum, 'F78:V49': sum, 'F78:V50': sum, 'F78:V31': sum, 'total43': sum, 'total44': sum, 'total45': sum, 'total46': sum, 'total807': sum, 'total808': sum, 'total47': sum, 'total48': sum, 'total49': sum, 'total50': sum, 'total31': sum}, margins=True, margins_name='total2', fill_value=0).reset_index()
index_names = compass_df_Instrument[(compass_df_Instrument['OECD_STATUS'] == '')].index
compass_df_Instrument.drop(index_names, inplace = True)

compass_df_Instrument['isbudgeted43'] = compass_df_Instrument['isbudgeted43'] / compass_df_Instrument['total43']
compass_df_Instrument['isbudgeted44'] = compass_df_Instrument['isbudgeted44'] / compass_df_Instrument['total44']
compass_df_Instrument['isbudgeted45'] = compass_df_Instrument['isbudgeted45'] / compass_df_Instrument['total45']
compass_df_Instrument['isbudgeted46'] = compass_df_Instrument['isbudgeted46'] / compass_df_Instrument['total46']
compass_df_Instrument['isbudgeted807'] = compass_df_Instrument['isbudgeted807'] / compass_df_Instrument['total807']
compass_df_Instrument['isbudgeted808'] = compass_df_Instrument['isbudgeted808'] / compass_df_Instrument['total808']
compass_df_Instrument['isbudgeted47'] = compass_df_Instrument['isbudgeted47'] / compass_df_Instrument['total47']
compass_df_Instrument['isbudgeted48'] = compass_df_Instrument['isbudgeted48'] / compass_df_Instrument['total48']
compass_df_Instrument['isbudgeted49'] = compass_df_Instrument['isbudgeted49'] / compass_df_Instrument['total49']
compass_df_Instrument['isbudgeted50'] = compass_df_Instrument['isbudgeted50'] / compass_df_Instrument['total50']
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted31'] / compass_df_Instrument['total31']


# Get rid of intermediate totals (not needed any more)
compass_df_Instrument = compass_df_Instrument.drop(['total43', 'total44', 'total45', 'total46', 'total807', 'total808', 'total47', 'total48', 'total49', 'total50', 'total31'], axis=1).copy()

# Generate a dataframe filtered on OECD member only:
compass_df_Instrument_bis = compass_df_Instrument[compass_df_Instrument.OECD_STATUS == "Member"]

new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].mean(), 'F78:V31': compass_df_Instrument_bis['F78:V31'].mean(), 'F78:V43': compass_df_Instrument_bis['F78:V43'].mean(), 
           'F78:V44': compass_df_Instrument_bis['F78:V44'].mean(), 'F78:V45': compass_df_Instrument_bis['F78:V45'].mean(), 'F78:V46': compass_df_Instrument_bis['F78:V46'].mean(), 'F78:V807': compass_df_Instrument_bis['F78:V807'].mean(), 'F78:V808': compass_df_Instrument_bis['F78:V808'].mean(), 'F78:V47': compass_df_Instrument_bis['F78:V47'].mean(), 
           'F78:V48': compass_df_Instrument_bis['F78:V48'].mean(), 'F78:V49': compass_df_Instrument_bis['F78:V49'].mean(), 'F78:V50': compass_df_Instrument_bis['F78:V50'].mean(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].mean(), 'isbudgeted43': compass_df_Instrument_bis['isbudgeted43'].mean(), 'isbudgeted44': compass_df_Instrument_bis['isbudgeted44'].mean(), 'isbudgeted45': compass_df_Instrument_bis['isbudgeted45'].mean(), 
           'isbudgeted46': compass_df_Instrument_bis['isbudgeted46'].mean(), 'isbudgeted807': compass_df_Instrument_bis['isbudgeted807'].mean(), 'isbudgeted808': compass_df_Instrument_bis['isbudgeted808'].mean(), 'isbudgeted47': compass_df_Instrument_bis['isbudgeted47'].mean(), 'isbudgeted48': compass_df_Instrument_bis['isbudgeted48'].mean(), 'isbudgeted49': compass_df_Instrument_bis['isbudgeted49'].mean(), 
           'isbudgeted50': compass_df_Instrument_bis['isbudgeted50'].mean(), 'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean()}

new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].median(), 'F78:V31': compass_df_Instrument_bis['F78:V31'].median(), 'F78:V43': compass_df_Instrument_bis['F78:V43'].median(), 
           'F78:V44': compass_df_Instrument_bis['F78:V44'].median(), 'F78:V45': compass_df_Instrument_bis['F78:V45'].median(), 'F78:V46': compass_df_Instrument_bis['F78:V46'].median(), 'F78:V807': compass_df_Instrument_bis['F78:V807'].median(), 'F78:V808': compass_df_Instrument_bis['F78:V808'].median(), 'F78:V47': compass_df_Instrument_bis['F78:V47'].median(), 
           'F78:V48': compass_df_Instrument_bis['F78:V48'].median(), 'F78:V49': compass_df_Instrument_bis['F78:V49'].median(), 'F78:V50': compass_df_Instrument_bis['F78:V50'].median(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].median(), 'isbudgeted43': compass_df_Instrument_bis['isbudgeted43'].median(), 'isbudgeted44': compass_df_Instrument_bis['isbudgeted44'].median(), 'isbudgeted45': compass_df_Instrument_bis['isbudgeted45'].median(), 
           'isbudgeted46': compass_df_Instrument_bis['isbudgeted46'].median(), 'isbudgeted807': compass_df_Instrument_bis['isbudgeted807'].median(), 'isbudgeted808': compass_df_Instrument_bis['isbudgeted808'].median(), 'isbudgeted47': compass_df_Instrument_bis['isbudgeted47'].median(), 'isbudgeted48': compass_df_Instrument_bis['isbudgeted48'].median(), 'isbudgeted49': compass_df_Instrument_bis['isbudgeted49'].median(), 
           'isbudgeted50': compass_df_Instrument_bis['isbudgeted50'].median(), 'isbudgeted': compass_df_Instrument_bis['isbudgeted'].median()}

# Append the OECD data to the dataframes
compass_df_Instrument = compass_df_Instrument.append(new_row, ignore_index=True)


# Generate new labels dataframe
compass_df_Instrument_Label = compass_df_Instrument[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']
compass_df_Instrument_Label['isbudgeted'] = compass_df_Instrument_Label['InstrumentID'] * compass_df_Instrument_Label['isbudgeted']
for x in range(0, compass_df_Instrument_Label.shape[0]):
    if compass_df_Instrument_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (round(compass_df_Instrument_Label['InstrumentID'].loc[x],1).astype(float)).astype(str) + ") "
    else:
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " ("  + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ") "     

compass_df_Instrument_Label
        
# Prepare the dataframe for graph inclusion:
# Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted43', 'isbudgeted44', 'isbudgeted45', 'isbudgeted46', 'isbudgeted807', 'isbudgeted808', 'isbudgeted47', 'isbudgeted48', 'isbudgeted49', 'isbudgeted50', 'isbudgeted31', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50', 'F78:V31'])

# The dataframe obtained shows counts of initiatives for each time horizon - Compute the proportion instead
compass_df_Instrument["F78:V32"] = 0
for x in range(16, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument["F78:V32"] = compass_df_Instrument["F78:V32"] + compass_df_Instrument.iloc[:,x]
for x in range(16, compass_df_Instrument.shape[1]):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] / compass_df_Instrument["F78:V32"] #compass_df_Instrument.iloc[:,compass_df_Instrument.shape[1]]

    # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument.loc[compass_df_Instrument["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Trim the dataframe to only core data
compass_df_Instrument2 = compass_df_Instrument2[['CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted43', 'isbudgeted44', 'isbudgeted45', 'isbudgeted46', 'isbudgeted807', 'isbudgeted808', 'isbudgeted47', 'isbudgeted48', 'isbudgeted49', 'isbudgeted50', 'isbudgeted31', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50', 'F78:V31']].copy()

    # rename the InstrumentID to total
compass_df_Instrument2 = compass_df_Instrument2.rename(columns={"InstrumentID":"Total"})

    # Aggregate data by countries in two distinct dataframe
compass_df_Instrument2_isbudgeted = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['isbudgeted43', 'isbudgeted44', 'isbudgeted45', 'isbudgeted46', 'isbudgeted807', 'isbudgeted808', 'isbudgeted47', 'isbudgeted48', 'isbudgeted49', 'isbudgeted50', 'isbudgeted31'], aggfunc={'isbudgeted43': sum, 'isbudgeted44': sum, 'isbudgeted45': sum, 'isbudgeted46': sum, 'isbudgeted807': sum, 'isbudgeted808': sum, 'isbudgeted47': sum, 'isbudgeted48': sum, 'isbudgeted49': sum, 'isbudgeted50': sum, 'isbudgeted31': sum}, fill_value=0).reset_index()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['Total', 'isbudgeted', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50', 'F78:V31'], aggfunc={'Total': sum, 'isbudgeted': pd.Series.mean, 'F78:V43': sum, 'F78:V44': sum, 'F78:V45': sum, 'F78:V46': sum, 'F78:V807': sum, 'F78:V808': sum, 'F78:V47': sum, 'F78:V48': sum, 'F78:V49': sum, 'F78:V50': sum, 'F78:V31': sum}, fill_value=0).reset_index()
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='CountryLabel', ascending=True)
compass_df_Instrument2['Total'] = compass_df_Instrument2['F78:V43'] + compass_df_Instrument2['F78:V44'] + compass_df_Instrument2['F78:V45'] + compass_df_Instrument2['F78:V46'] + compass_df_Instrument2['F78:V807'] + compass_df_Instrument2['F78:V808'] + compass_df_Instrument2['F78:V47'] + compass_df_Instrument2['F78:V48'] + compass_df_Instrument2['F78:V49'] + compass_df_Instrument2['F78:V50'] + compass_df_Instrument2['F78:V31']

# Prepare the dataframe for graph inclusion:

    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:

index_names = compass_df_Instrument2[(compass_df_Instrument2['isbudgeted']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(index_names, inplace = False)

    # Obtain the data in percent instead:
for x in range(2, compass_df_Instrument2.shape[1]-2):
       compass_df_Instrument2.iloc[:,x] = 100* compass_df_Instrument2.iloc[:,x]/ compass_df_Instrument2['Total']

# Rank the factors:
    # Remove the CountryCode:OECD (to exclude it from TOP computation) and store them in another dataframe (we'll need them again)
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2.iloc[compass_df_Instrument2.shape[0]-1:,:]
index_names = compass_df_Instrument2[(compass_df_Instrument2['CountryCode'] == 'OECD') | (compass_df_Instrument2['CountryCode'] == 'OECD2')].index
compass_df_Instrument2.drop(index_names, inplace = True)

list_mean_order = {'CountryLabel':'TEST', 'F78:V31': compass_df_Instrument2['F78:V31'].mean(), 'F78:V43': compass_df_Instrument2['F78:V43'].mean(), 'F78:V44': compass_df_Instrument2['F78:V44'].mean(), 'F78:V45': compass_df_Instrument2['F78:V45'].mean(), 'F78:V46': compass_df_Instrument2['F78:V46'].mean(), 
                   'F78:V47': compass_df_Instrument2['F78:V47'].mean(), 'F78:V48': compass_df_Instrument2['F78:V48'].mean(), 'F78:V49': compass_df_Instrument2['F78:V49'].mean(), 'F78:V50': compass_df_Instrument2['F78:V50'].mean(), 'F78:V807': compass_df_Instrument2['F78:V807'].mean(), 'F78:V808': compass_df_Instrument2['F78:V808'].mean()}

list_mean_order_isbudgeted = {'CountryLabel':'TEST', 'isbudgeted31': compass_df_Instrument2['F78:V31'].mean(), 'isbudgeted43': compass_df_Instrument2['F78:V43'].mean(), 'isbudgeted44': compass_df_Instrument2['F78:V44'].mean(), 'isbudgeted45': compass_df_Instrument2['F78:V45'].mean(), 'isbudgeted46': compass_df_Instrument2['F78:V46'].mean(), 
                   'isbudgeted47': compass_df_Instrument2['F78:V47'].mean(), 'isbudgeted48': compass_df_Instrument2['F78:V48'].mean(), 'isbudgeted49': compass_df_Instrument2['F78:V49'].mean(), 'isbudgeted50': compass_df_Instrument2['F78:V50'].mean(), 'isbudgeted807': compass_df_Instrument2['F78:V807'].mean(), 'isbudgeted808': compass_df_Instrument2['F78:V808'].mean()}


compass_df_Instrument2 = compass_df_Instrument2.append(list_mean_order, ignore_index=True)
compass_df_Instrument2 = compass_df_Instrument2.append(compass_df_Instrument2_filtered_OECD, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(list_mean_order_isbudgeted, ignore_index=True)

compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2 = compass_df_Instrument2.drop(['CountryCode', 'isbudgeted', 'Total'], axis=1).copy()


# sort both dataframes
    # sort table compass_df_Instrument2_isbudgeted
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.T
new_header = compass_df_Instrument2_isbudgeted.iloc[0]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted[1:]
compass_df_Instrument2_isbudgeted.columns = new_header
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.astype(float)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['TEST'], axis=1).copy()

    # sort table compass_df_Instrument2
compass_df_Instrument2 = compass_df_Instrument2.T
new_header = compass_df_Instrument2.iloc[0]
compass_df_Instrument2 = compass_df_Instrument2[1:]
compass_df_Instrument2.columns = new_header
compass_df_Instrument2 = compass_df_Instrument2.astype(float)
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2 = compass_df_Instrument2.drop(['TEST'], axis=1).copy()

# use the index to create a variable called axislocation
My_position = compass_df_Instrument2.copy()
My_position = My_position.reset_index()
My_position = My_position[["level_0", "index"]]
My_position.rename(columns={'level_0': 'axislocation'}, inplace=True)

# melt the two dataframes
compass_df_Instrument3 = compass_df_Instrument2_isbudgeted.copy()
new_header3 = list(compass_df_Instrument2_isbudgeted.columns[1:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["index"], value_vars=new_header3, var_name='Variable', value_name='Color')
compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[1:].values)
compass_df_Instrument4 = compass_df_Instrument4.melt(id_vars=["index"], value_vars=new_header4, var_name='Variable', value_name='Value')

compass_df_Instrument2 = compass_df_Instrument4.copy()
compass_df_Instrument2["Color"] = compass_df_Instrument3["Color"]

    # add the axislocation to compass_df_Instrument2

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      My_position, 
                      on ='index', 
                      how ='inner')

    # Rename the columns from Dict_Headers name

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')    
My_position = pd.merge(My_position, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument_Label, 
                        left_on ='Variable', 
                        right_on=['CountryLabel'],
                        how ='inner')
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)
compass_df_Instrument2['Value'] = compass_df_Instrument2['Value']/100

compass_df_Instrument2 = compass_df_Instrument2.drop(['Code', 'CountryLabel', 'Variable', 'isbudgeted'], axis=1).copy()
compass_df_Instrument2.rename(columns={'CountryLabel2': 'CountryLabel'}, inplace=True)


# Generate the graph:

import plotly.graph_objects as go

Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 10: Selection criteria for project grants for public research focuses <br> as a share of total budget-weighted instruments reported",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=550,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Selection criteria',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(min(My_position["axislocation"])-1, max(My_position["axislocation"]) +1 ),
    ticktext=My_position['Long Description'],
    tickvals=My_position["axislocation"],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
fig.show()

Figure 10 indicates that scientific and societal impact anticipated are the two main criteria used by grants within the five selected countries. This is in line with the results observed for the OECD mean. All the other criteria are much less represented in comparison. In relation to other countries, Germany and Canada have a more diversified set of selection criteria, whereas in France the alignment with national research priorities and the geographic location are more significant. The data for Korea appears to be too limited to draw robust conclusions.

4.3. Dedicated support to research infrastructures¶

4.3.1. Objectives¶

The last instrument addressed in this analysis concerns dedicated support to research infrastructures. This includes support for the creation of new facilities, resources and services used by the science community to conduct research and foster innovation. Figure 11 represents the prevalence of the objectives cited by these such instruments, across the five selected countries. Following the same approach as for previous charts, the data is aggregated by country using the NBW values to calculate the proportion of each objective within the total pool of instruments. More than one objective can be targeted by instruments.

# Data preparation:
    # Copy the central dataframe
compass_df_Instrument = compass_df.copy()

    # Filter on initiatives linked to "Project grants for public research"
compass_df_Instrument = compass_df_Instrument[(compass_df_Instrument.InstrumentTypeLabel == "Dedicated support to research infrastructures")]

    # Aggregate data by InstrumentID
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentID'], values=['NBW', 'isbudgeted', 'F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862'], aggfunc={'NBW': pd.Series.mean, 'F120:V856': sum, 'F120:V857': sum, 'F120:V858': sum, 'F120:V859': sum, 'F120:V860': sum, 'F120:V861': sum, 'F120:V862': sum, 'isbudgeted': pd.Series.mean}, margins=True, margins_name='total', fill_value=0).reset_index()

    # Add an extra column to account for missing values:
compass_df_Instrument["F120:V863"] = 0

for x in range(5, compass_df_Instrument.shape[1]-3):
    compass_df_Instrument["F120:V863"] = compass_df_Instrument["F120:V863"] + compass_df_Instrument.iloc[:,x]

for x in range(0, compass_df_Instrument.shape[0]):
    if compass_df_Instrument["F120:V863"].loc[x] != 0:
        compass_df_Instrument["F120:V863"].loc[x]  = 0
    else:
        compass_df_Instrument["F120:V863"].loc[x]  = 1
        # Note: we've added "F120:V863" to the dictionary earlier

    # Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'NBW', 'isbudgeted', 'F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'F120:V863'])
compass_df_Instrument['isbudgeted856'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V856']
compass_df_Instrument['isbudgeted857'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V857']
compass_df_Instrument['isbudgeted858'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V858']
compass_df_Instrument['isbudgeted859'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V859']
compass_df_Instrument['isbudgeted860'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V860']
compass_df_Instrument['isbudgeted861'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V861']
compass_df_Instrument['isbudgeted862'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V862']
compass_df_Instrument['isbudgeted863'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V863']

compass_df_Instrument['total856'] = compass_df_Instrument['F120:V856']
compass_df_Instrument['total857'] = compass_df_Instrument['F120:V857']
compass_df_Instrument['total858'] = compass_df_Instrument['F120:V858']
compass_df_Instrument['total859'] = compass_df_Instrument['F120:V859']
compass_df_Instrument['total860'] = compass_df_Instrument['F120:V860']
compass_df_Instrument['total861'] = compass_df_Instrument['F120:V861']
compass_df_Instrument['total862'] = compass_df_Instrument['F120:V862']
compass_df_Instrument['total863'] = compass_df_Instrument['F120:V863']

# Weight each data by NBW
for x in range(6, compass_df_Instrument.shape[1]-16):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] * compass_df_Instrument['NBW']

# Aggregate the data by countries
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['isbudgeted', 'InstrumentID', 'F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'F120:V863', 'isbudgeted856', 'isbudgeted857', 'isbudgeted858', 'isbudgeted859', 'isbudgeted860', 'isbudgeted861', 'isbudgeted862', 'isbudgeted863', 'total856', 'total857', 'total858', 'total859', 'total860', 'total861', 'total862', 'total863'], aggfunc={'isbudgeted': sum, 'InstrumentID':pd.Series.nunique, 'F120:V856': sum, 'F120:V857': sum, 'F120:V858': sum, 'F120:V859': sum, 'F120:V860': sum, 'F120:V861': sum, 'F120:V862': sum, 'F120:V863': sum, 'isbudgeted856': sum, 'isbudgeted857': sum, 'isbudgeted858': sum, 'isbudgeted859': sum, 'isbudgeted860': sum, 'isbudgeted861': sum, 'isbudgeted862': sum, 'isbudgeted863': sum, 'total856': sum, 'total857': sum, 'total858': sum, 'total859': sum, 'total860': sum, 'total861': sum, 'total862': sum, 'total863': sum}, margins=True, margins_name='total2', fill_value=0).reset_index()
index_names = compass_df_Instrument[(compass_df_Instrument['OECD_STATUS'] == '')].index
compass_df_Instrument.drop(index_names, inplace = True)

compass_df_Instrument['isbudgeted856'] = compass_df_Instrument['isbudgeted856'] / compass_df_Instrument['total856']
compass_df_Instrument['isbudgeted857'] = compass_df_Instrument['isbudgeted857'] / compass_df_Instrument['total857']
compass_df_Instrument['isbudgeted858'] = compass_df_Instrument['isbudgeted858'] / compass_df_Instrument['total858']
compass_df_Instrument['isbudgeted859'] = compass_df_Instrument['isbudgeted859'] / compass_df_Instrument['total859']
compass_df_Instrument['isbudgeted860'] = compass_df_Instrument['isbudgeted860'] / compass_df_Instrument['total860']
compass_df_Instrument['isbudgeted861'] = compass_df_Instrument['isbudgeted861'] / compass_df_Instrument['total861']
compass_df_Instrument['isbudgeted862'] = compass_df_Instrument['isbudgeted862'] / compass_df_Instrument['total862']
compass_df_Instrument['isbudgeted863'] = compass_df_Instrument['isbudgeted863'] / compass_df_Instrument['total863']

# Get rid of intermediate totals (not needed any more)
compass_df_Instrument = compass_df_Instrument.drop(['total856', 'total857', 'total858', 'total859', 'total860', 'total861', 'total862', 'total863'], axis=1).copy()

# Generate a dataframe filtered on OECD member only:
compass_df_Instrument_bis = compass_df_Instrument[compass_df_Instrument.OECD_STATUS == "Member"]
new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'F120:V856': round(compass_df_Instrument_bis['F120:V856'].mean(),1), 'F120:V857': round(compass_df_Instrument_bis['F120:V857'].mean(),1), 'F120:V858': round(compass_df_Instrument_bis['F120:V858'].mean(),1), 'F120:V859': round(compass_df_Instrument_bis['F120:V859'].mean(),1), 'F120:V860': compass_df_Instrument_bis['F120:V860'].mean(), 'F120:V861': compass_df_Instrument_bis['F120:V861'].mean(), 'F120:V862': compass_df_Instrument_bis['F120:V862'].mean(), 'F120:V863': compass_df_Instrument_bis['F120:V863'].mean(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted856': compass_df_Instrument_bis['isbudgeted856'].mean(), 'isbudgeted857': compass_df_Instrument_bis['isbudgeted857'].mean(), 'isbudgeted858': compass_df_Instrument_bis['isbudgeted858'].mean(), 'isbudgeted859': compass_df_Instrument_bis['isbudgeted859'].mean(), 'isbudgeted860': compass_df_Instrument_bis['isbudgeted860'].mean(), 'isbudgeted861': compass_df_Instrument_bis['isbudgeted861'].mean(), 'isbudgeted862': compass_df_Instrument_bis['isbudgeted862'].mean(), 'isbudgeted863': compass_df_Instrument_bis['isbudgeted863'].mean(), 
           'InstrumentID': compass_df_Instrument_bis['InstrumentID'].mean()}
new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD', 'F120:V856': round(compass_df_Instrument_bis['F120:V856'].median(),1), 'F120:V857': round(compass_df_Instrument_bis['F120:V857'].median(),1), 'F120:V858': round(compass_df_Instrument_bis['F120:V858'].median(),1), 'F120:V859': round(compass_df_Instrument_bis['F120:V859'].median(),1), 'F120:V860': compass_df_Instrument_bis['F120:V860'].median(), 'F120:V861': compass_df_Instrument_bis['F120:V861'].median(), 'F120:V862': compass_df_Instrument_bis['F120:V862'].median(), 'F120:V863': compass_df_Instrument_bis['F120:V863'].median(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted856': compass_df_Instrument_bis['isbudgeted856'].median(), 'isbudgeted857': compass_df_Instrument_bis['isbudgeted857'].median(), 'isbudgeted858': compass_df_Instrument_bis['isbudgeted858'].median(), 'isbudgeted859': compass_df_Instrument_bis['isbudgeted859'].median(), 'isbudgeted860': compass_df_Instrument_bis['isbudgeted860'].median(), 'isbudgeted861': compass_df_Instrument_bis['isbudgeted861'].median(), 'isbudgeted862': compass_df_Instrument_bis['isbudgeted862'].median(), 'isbudgeted863': compass_df_Instrument_bis['isbudgeted863'].median(), 
           'InstrumentID': compass_df_Instrument_bis['InstrumentID'].median()}


# Append the OECD data to the dataframes
compass_df_Instrument = compass_df_Instrument.append(new_row, ignore_index=True)

# Generate new labels dataframe (this module which generates custom label, is conserved in this graph's code to conserve the same structure as other graphs, but is not used)
compass_df_Instrument_Label = compass_df_Instrument[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']


# Prepare the dataframe for graph inclusion:
# Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted856', 'isbudgeted857', 'isbudgeted858', 'isbudgeted859', 'isbudgeted860', 'isbudgeted861', 'isbudgeted862','isbudgeted863','F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'F120:V863'])

# The dataframe obtained shows counts of initiatives for each time horizon - Compute the proportion instead
compass_df_Instrument["F120:V864"] = 0
for x in range(13, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument["F120:V864"] = compass_df_Instrument["F120:V864"] + compass_df_Instrument.iloc[:,x]
for x in range(13, compass_df_Instrument.shape[1]):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] / compass_df_Instrument["F120:V864"] 

    # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument.loc[compass_df_Instrument["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Trim the dataframe to only core data and transpose it
compass_df_Instrument2 = compass_df_Instrument2[['CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted856', 'isbudgeted857', 'isbudgeted858', 'isbudgeted859', 'isbudgeted860', 'isbudgeted861', 'isbudgeted862','isbudgeted863','F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'F120:V863']].copy()
 
    # rename the InstrumentID to total
compass_df_Instrument2 = compass_df_Instrument2.rename(columns={"InstrumentID":"Total"})

    # Aggregate data by countries in two distinct dataframe
compass_df_Instrument2_isbudgeted = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['isbudgeted856', 'isbudgeted857', 'isbudgeted858', 'isbudgeted859', 'isbudgeted860', 'isbudgeted861', 'isbudgeted862', 'isbudgeted863'], aggfunc={'isbudgeted856': sum, 'isbudgeted857': sum, 'isbudgeted858': sum, 'isbudgeted859': sum, 'isbudgeted860': sum, 'isbudgeted861': sum, 'isbudgeted862': sum, 'isbudgeted863': sum}, fill_value=0).reset_index()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['Total', 'isbudgeted', 'F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'F120:V863'], aggfunc={'Total': sum, 'isbudgeted': pd.Series.mean, 'F120:V856': sum, 'F120:V857': sum, 'F120:V858': sum, 'F120:V859': sum, 'F120:V860': sum, 'F120:V861': sum, 'F120:V862': sum, 'F120:V863': sum}, fill_value=0).reset_index()
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='CountryLabel', ascending=True)
compass_df_Instrument2['Total'] = compass_df_Instrument2['F120:V856'] + compass_df_Instrument2['F120:V857'] + compass_df_Instrument2['F120:V858'] + compass_df_Instrument2['F120:V859'] + compass_df_Instrument2['F120:V860'] + compass_df_Instrument2['F120:V861'] + compass_df_Instrument2['F120:V862'] + compass_df_Instrument2['F120:V863']

# Prepare the dataframe for graph inclusion:

    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:

index_names = compass_df_Instrument2[(compass_df_Instrument2['isbudgeted']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(index_names, inplace = False)

    # Obtain the data in percent instead:
for x in range(2, compass_df_Instrument2.shape[1]-2):
       compass_df_Instrument2.iloc[:,x] = 100* compass_df_Instrument2.iloc[:,x]/ compass_df_Instrument2['Total']

# Rank the factors:
    # Remove the CountryCode:OECD (to exclude it from TOP computation) and store them in another dataframe (we'll need them again)
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2.iloc[compass_df_Instrument2.shape[0]-1:,:]
index_names = compass_df_Instrument2[(compass_df_Instrument2['CountryCode'] == 'OECD') | (compass_df_Instrument2['CountryCode'] == 'OECD2')].index
compass_df_Instrument2.drop(index_names, inplace = True)

list_mean_order = {'CountryLabel':'TEST', 'F120:V856': compass_df_Instrument2['F120:V856'].mean(), 'F120:V857': compass_df_Instrument2['F120:V857'].mean(), 'F120:V858': compass_df_Instrument2['F120:V858'].mean(), 'F120:V859': compass_df_Instrument2['F120:V859'].mean(), 'F120:V860': compass_df_Instrument2['F120:V860'].mean(), 
                   'F120:V861': compass_df_Instrument2['F120:V861'].mean(), 'F120:V862': compass_df_Instrument2['F120:V862'].mean(), 'F120:V863': compass_df_Instrument2['F120:V863'].mean()}

list_mean_order_isbudgeted = {'CountryLabel':'TEST', 'isbudgeted856': compass_df_Instrument2['F120:V856'].mean(), 'isbudgeted857': compass_df_Instrument2['F120:V857'].mean(), 'isbudgeted858': compass_df_Instrument2['F120:V858'].mean(), 'isbudgeted859': compass_df_Instrument2['F120:V859'].mean(), 'isbudgeted860': compass_df_Instrument2['F120:V860'].mean(), 
                   'isbudgeted861': compass_df_Instrument2['F120:V861'].mean(), 'isbudgeted862': compass_df_Instrument2['F120:V862'].mean(), 'isbudgeted863': compass_df_Instrument2['F120:V863'].mean()}


compass_df_Instrument2 = compass_df_Instrument2.append(list_mean_order, ignore_index=True)
compass_df_Instrument2 = compass_df_Instrument2.append(compass_df_Instrument2_filtered_OECD, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(list_mean_order_isbudgeted, ignore_index=True)

compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2 = compass_df_Instrument2.drop(['CountryCode', 'isbudgeted', 'Total'], axis=1).copy()


# sort both dataframes
    # sort table compass_df_Instrument2_isbudgeted
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.T
new_header = compass_df_Instrument2_isbudgeted.iloc[0]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted[1:]
compass_df_Instrument2_isbudgeted.columns = new_header
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.astype(float)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['TEST'], axis=1).copy()

    # sort table compass_df_Instrument2
compass_df_Instrument2 = compass_df_Instrument2.T
new_header = compass_df_Instrument2.iloc[0]
compass_df_Instrument2 = compass_df_Instrument2[1:]
compass_df_Instrument2.columns = new_header
compass_df_Instrument2 = compass_df_Instrument2.astype(float)
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2 = compass_df_Instrument2.drop(['TEST'], axis=1).copy()

# use the index to create a variable called axislocation
My_position = compass_df_Instrument2.copy()
My_position = My_position.reset_index()
My_position = My_position[["level_0", "index"]]
My_position.rename(columns={'level_0': 'axislocation'}, inplace=True)

# melt the two dataframes
compass_df_Instrument3 = compass_df_Instrument2_isbudgeted.copy()
new_header3 = list(compass_df_Instrument2_isbudgeted.columns[1:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["index"], value_vars=new_header3, var_name='Variable', value_name='Color')
compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[1:].values)
compass_df_Instrument4 = compass_df_Instrument4.melt(id_vars=["index"], value_vars=new_header4, var_name='Variable', value_name='Value')

compass_df_Instrument2 = compass_df_Instrument4.copy()
compass_df_Instrument2["Color"] = compass_df_Instrument3["Color"]

    # add the axislocation to compass_df_Instrument2

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      My_position, 
                      on ='index', 
                      how ='inner')

    # Rename the columns from Dict_Headers name

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')    
My_position = pd.merge(My_position, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument_Label, 
                        left_on ='Variable', 
                        right_on=['CountryLabel'],
                        how ='inner')
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)
compass_df_Instrument2['Value'] = compass_df_Instrument2['Value']/100

compass_df_Instrument2 = compass_df_Instrument2.drop(['Code', 'CountryLabel', 'Variable', 'isbudgeted'], axis=1).copy()
compass_df_Instrument2.rename(columns={'CountryLabel2': 'CountryLabel'}, inplace=True)

# Generate the graph:

import plotly.graph_objects as go

Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 11: Objectives in dedicated support for research infrastructures <br> as a share of total budget-weighted instruments reported",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=550,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Objectives reported',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(min(My_position["axislocation"])-1, max(My_position["axislocation"]) +1 ),
    ticktext=My_position['Long Description'], 
    tickvals=My_position["axislocation"],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
fig.show()

Compared to other instruments analysed in this notebook, there sample size is much smaller, i.e. with on average 4.7 instruments in OECD countries. Moreover, in many instances no objective has been reported. This raises questions on the comprehensiveness of the data and whether it is sufficient for interpreting the data for the five selected countries. With regards to the OECD mean, alignment with national research priorities, support to internationalization of research and science-industry collaboration are the main objectives.

5. Outlook¶

The STIP Compass dataset is rich and potentially lends itself to cross-country comparison. The visualisations produced in this notebook are meant to be interpreted as indications of trends rather than precise measures, due to the qualitative nature of the data. The various policy indicators built in this notebook are normalised against the data reported by each country, to take into account different levels of reporting and make comparisons possible. Such indicators could benefit from being better placed within a broader national context, e.g. using statistical indicators and other kinds of evidence.

The main challenge in this type of analysis is that the levels of granularity in the data reported vary from country to country. The raw data needs to be inspected, in consultation with government officials and academics working in the field of public research, to determine whether it is representative of ongoing national policies. The visualisations in this notebook could also be used to this end, as country experts should be able to point out inconsistencies in the patterns emerging from the data.

This notebook is made available online as a proof of concept, to offer country policy analysts tools to reproduce and explore benchmarking national policies using the STIP Compass dataset.