Port Performance from A Policy Perspective – A Systematic Review of the Literature

Owing to their diverse functionality, seaports as entities produce a mix of private and public goods that have significant welfare implications for the regions they serve. In effect, performance in seaports can be viewed as multi-dimensional. When forming policy that affects multiple stakeholders it is therefore desirable to measure performance across all relevant dimensions, as they are likely to have differing effects on concerned stakeholders. The objective of this study is to conduct a systematic literature review of published studies on seaport performance measurement to identify, critically evaluate and integrate the various dimensions of seaport performance measurement. A review of the literature was carried out focusing on key questions in performance measurement system design outlining what to measure and how to measure it. Our study finds that measuring port performance has been expanding rapidly leading to significant advancement in the development of methods to create different measures of port performance. However, there has been less progress in advancing means to define what constitutes performance as a construct particularly when performance is perceived as multidimensional. In this review, five dimensions of seaport performance were identified. In addition, a formative construct of performance was proposed for the design of performance measurement systems to address policy concerns when performance is of a multidimensional nature. This review is available in Journal of Ocean and Coastal Economics: https://cbe.miis.edu/joce/vol6/iss1/3


Port Performance from A Policy Perspective -A Systematic Review of the Port Performance from A Policy Perspective -A Systematic Review of the Literature Literature synthesizes data, and reports the evidence in such a way that allows reasonably clear conclusions to be reached about what is and what is not known"
. A similar approach was applied in order to review the literature on port competitiveness based on methodology suggested by Parola et al. (2016).
The methodology is outlined in the following four steps:

Step 1: Question Formulation
The review question should provide focus for the review process, providing a reference point for the creation of strategy in relation to the location and selection and evaluation of studies. Neely et al. (2005Neely et al. ( , 1228 define performance measurement as the process of "quantifying action, where measurement is the process of quantification and action leads to performance". Put simply, performance measures can be defined as metrics used to quantify the efficiency and/or effectiveness of an action, whereas a performance measurement system is the set of performance measures. Nudurupati et al. (2011) further propose that contemporary literature on the design of performance measurement systems focuses on two fundamental questions, namely; "what to measure and how to structure the Performance Measurement System, i.e. they try to answer the question "how to design the Performance Measurement System?" (Nudurupati et al. 2011, 287).
In accordance with the above, the questions for this systematic review were formulated as follows: How has the literature on seaports performance, in particular on the process of performance measurement system design, addressed the fundamental questions of i) what to measure and ii) how to structure the performance measurement system? In order for a review question to be searchable, the research question must be broken down into key words. For this analysis, the key terms were selected" "Port", "Performance" and "Measurement Systems".

Step 2: Study Location
Following the formulation of the research questions, the next step in the systematic review process was the location of studies. Our systematic review involved the location, selection and appraisal of the relevant knowledge specific to the research inquiry (Denyer and Tranfield 2009). The following steps were further taken regarding the identification for the location of studies.

Selecting Databases
In order to ensure as much coverage of published literature as possible, existing literature reviews were examined in conjunction with key articles identified in early scoping work.  conducted an extensive review on methodological issues in seaport research, and found that academic journals related to Maritime Management and Policy and Maritime Economics and Logistics were the publications most cited during the period 1980-2009, with 233 and 101 citations between them respectively. Other prominent journals identified, included Transportation Research Part A: Policy and Practice and Transportation Research Part B: Methodological and Transport Reviews. The databases Scopus, EconLit, Business Source Complete and Academic Source Complete provided comprehensive coverage of the identified journals of importance, as well as additional coverage of multiple disciplines important to satisfy the principle of inclusivity.

Search strategy and preliminary exclusion criteria.
Four preliminary exclusionary conditions were established and applied to the searches in turn. First, in practical terms, it was necessary that the review was limited to studies that had a bearing on the specific research question. Initially, all articles unrelated to seaport performance were automatically excluded. The second condition was that only articles published in peer reviewed articles were included for review. This was to improve quality control in the initial search. Third, the applicability of articles was confirmed by the presence of the three selected key terms in the title, abstract and keywords. Finally, only articles in the English Language were selected.
2.3.3 Using keywords to create search strings, pilot searches and the subsequent refinement of search strings. Search strings were used from the key words identified in Step 1. To ensure inclusivity, deviations of the search terms were included and the terms brought together using Boolean Operators. In order to test and refine the search strings, initial pilot tests were conducted using Scopus, being the largest of the selected citation databases. As each database has different functionalities, each exclusionary step taken was related to the condition that the review had to be limited to studies that have bearing on its specific research question. The most common issue arose surrounding the plurality in meaning of the term port. As the word port is used to describe a variety of settings in areas such as electrical engineering and computer science, literature linking the word "port" and "performance" was extensive covering a wide variety of unrelated disciplines. An essential component of the SLR is the clear documentation of exclusion criteria and recording of searches 2 . Upon completion, a total 1,766 articles were identified across the four databases. Reflective of the multidisciplinary nature of the research, articles were spread throughout fields, with the highest number in engineering and social sciences, but also across disciplines such as computer science and environmental science.

Step 3: Study Selection and Evaluation
A relevancy appraisal checklist (Petticrew and Roberts 2008) was then used to rank articles based on their relevance to the research questions (the checklist is presented in Table 1). As shown in Table 1, studies in Group A were used for further analysis, while studies in Group B provided support to the analysis where relevant but were not counted in the final analysis or findings. Studies in Group C were excluded on the basis that they were not sufficiently relevant to the context of the research questions. Following the application of the Relevancy Appraisal Tool, 227 articles were identified as Group A studies. Following Denyer and Tranfield's (2009) methodology, a review of grey literature was conducted to improve completeness. Grey literature refers to studies that are published outside of peer-reviewed published articles and include conference papers and industry reports. From this category, 14 studies were added, giving a total of 243 studies included for literature analysis.

Table 1 -Relevancy Appraisal Checklist
Group A 1. Empirical Studies that measure the performance of seaports in relation to the ports' primary function as a centre for the transfer of goods and people and related activity. 2. Analytical studies that contribute to theory on seaport performance measurement in a way that is not captured or rigorously tested in empirical work including: Studies that apply a system of measurement to measure the performance of seaports and create performance measurements, metrics or indicators.
Studies that contribute to theory on seaport performance measurement, but do not measure seaport performance directly or measure elements of 2 Details of the iterative searches as well as the documented exclusion criteria are available on request performance too narrow in scope or context to provide general insight into the effects of the ports' primary functioning as described above.

2.
Studies that review or critique Group A studies. Group C 1.
Studies that address elements of performance unrelated to the effect of the performance of a ports' primary functions as described above 2.
Articles in which the principle theory or finding put forward has been integrated into later more comprehensive studies 3.
Empirical or analytical studies whose original contributions have been undermined by later studies in their field of research

Step 4: Analysis
After the selection of sources, it is necessary to analyse the contained data as a means of categorising the literature and identifying the different dimensions of performance. Categories were formed by grouping studies based on the dimensions of performance and the resulting measures examined.

FINDINGS
In total five dimensions were identified as detailed in Table 2. What follows is a thematic discussion of the literature identified within the review, focusing on the measures of performance created in each of the five respective dimensions of seaport performance.

Operational Dimension
Performance on the operational dimension measures shows how well relevant resources are utilised in the provision of seaport services. Notably the unit of analysis in this category differed across studies. Studies examining the provision of services ranging from individual seaport services within a port (e.g. terminal operators), to the provision of all services from a given port, to studies assessing the provision of services from clusters of seaports. Within the literature reviewed, two main approaches to measuring operational level performance were identified. The first measures the efficiency and productivity of seaports from an economics perspective. This typically involves measuring the relationship between inputs and outputs in the production cycle using empirical data. The second approach measures current production levels in respect to potential production in a theoretical capacity. This is typically measured through calculating the seaport's engineering optimum level of performance (Talley 2006).
Firstly, from the economic perspective, seaport productivity has been a topic of research since the 1970s, with early studies focused on partial productivity measures (UNCTAD 1976;Suykens 1983;G. DeMonie 1987;Tongzon 1995). Partial productivity measures are useful when assessing specific factors of production, yet when used in isolation, they have been found to deliver misleading accounts of overall productivity (Notteboom et al. 2000). In effect, measures of productivity and efficiency that take into account all inputs and outputs have come to be favoured in the literature. The two main types are i) total factor productivity measures and ii) measures of productive efficiency. The measurement of productive efficiency however mainly follows the approach adopted in Farrell (1957). Overall efficiency is separated into technical efficiency (maximisation of production possibility) and allocation efficiency (cost minimisation of input ratios, or profit maximisation Gap between performance and strategic objectives depending on the behavioural assumption) (Kopp and Diewert 1982). The reference technology is assessed by estimating a production possibility (or cost minimisation) frontier and then examining the performance of existing seaports relative to the frontier. The two most prevalent methods for estimating efficiency frontiers are the non-parametric Data Envelopment Analysis and the parametric Stochastic Frontier Analysis.
The second approach measures seaport performance against the potential operational capacity of seaports either through simulation or analytically by using queuing theory. An advantage of this approach is that it allows for the testing of alternative strategies and configurations and the examination of seaport operations under varying conditions. In this manner, the models created act as decision supports, allowing management to estimate the results of various strategies and inform subsequent decisions. In addition, at a strategic management level, these models provide insight into the productive capabilities of a seaport and can be used to inform investment decisions when additional capacity is required.

Customer Perspective Dimension
Studies in this category were grouped in order to examine the effectiveness of seaport service delivery and to examine the competitiveness of a seaport as determined by seaport service choice. In a competitive environment, the level of demand for services, relative to competitors, can generally determine the quality of services. Seaports, however, often serve as captive markets and the quality of services provided in seaports will only be one factor in determining a seaport's competitiveness. Therefore, in order to measure the quality of seaport services and competitiveness, it would be necessary to measure seaports performance from the perspective of its customers.
The most prominent research on seaport effectiveness comes from a series of studies by Brooks, Schellink and Pallis (Brooks et al. 2011a(Brooks et al. , 2011bSchellinck 2013, 2015), where effectiveness is defined as "doing the right things", for the customer. Framed from a marketing perspective, effectiveness is delineated as a complement to efficiency (described as "doing things right"). As discussed in the literature it concerns the quality of service from the user perspective. In total eight studies created measures of effectiveness, with data collected by way of survey and methods such as importance-performance analysis, Analytic Hierarchy Process and Confirmatory Factor Analysis employed to validate measures.
The second category is larger and more diverse with 30 studies selected. These studies are grouped on the basis that they focus on measuring the criteria that determine competitiveness from the seaport user's perspective. In comparison with the measurement of service effectiveness, these studies analyse the factors that determine seaport quality and will often factor in determinants outside of the seaports' control. For instance, most frequently, the stakeholder perspective is the shipping line with an increasing trend towards looking at the shipper and intermediaries (A. S.-F. Ng, Sun, and Bhattacharjya 2013;Tongzon 2009;Yuen et. al 2012;Yeo et al. 2011). The majority of studies reviewed measured the factors of seaport competitiveness through modelling seaport choice in a multi-criteria decision-making model. Analytical Hierarchy Process is the most popular method employed to estimate preference weights; it relies on expert decisions on pairwise comparisons. Discrete choice modelling was also prominent in the reviewed literature.

Logistics Chain Dimension
Seaports act as functional nodes within a transport network for the movement of goods within the supply chain logistics process. There is a body of literature that measures performance of seaports in relation to its position within the transport network and the supply chain, with two interrelated types of measures identified in this review. The first type refers to hinterland and foreland connectivity and accessibility measures. The second concerns the seaport's position or integration within the context of their supply chain.
The concept of regionalisation of port systems describes the increase in linkages between hinterland and ports as well as the integration of intermediate hubs for transhipment purposes (Notteboon and Rodrigue 2005; Rodrigue and Botteboon 2010). These developments are key drivers in the shift in the traditional seaport paradigm from "captive" to "contestable" or shared hinterlands (Ferrari et al. 2011). Cullinane and Want (2009) describe the competiveness of seaports as nodes, as relative to the mass of other nodes and the cost of reaching those nodes via the infrastructural network. In the literature reviewed, the use of graph theory is most commonly applied to create measures of accessibility. Measures include the connectivity between nodes in the maritime transport network and measures of accessibility of seaports in relation to hinterlands they serve.
The findings of this review show an apparent increase in the number of papers that seek to measure the position of seaports in their logistics and supply chains. The publication of key articles such as Robinson (2002); Carbone and Martino (2003); Bichou and Gray (2004), Robinson (2002), describe the shifting paradigm in port functions, as resulting in the increasing view of ports as embedded components of value driven chain systems. Seaports add value through the provision of services to other elements of the value driven chain and, as such, cannot be viewed in isolation. Rather a key element of a seaports' performance is the integration of seaports within logistic and supply chains.
Measures of supply chain integration in the literature include measures of value added services, multimodal integration and the use of ICS (Information Communication Systems) platforms.

Macro Dimension
This dimension is concerned with a seaport's performance that has unilateral effects on the inhabitants of the regions they serve; performance at this level is thus classed as the macro dimension. Seaports as producers of public goods inevitably have long-lasting impact on their service regions. The performance of seaports has national and regional welfare implications that extend beyond their regular commercial port activities (Dekker and Verhaeghe 2012). Consequently, the studies reviewed contained measures of performance on the macro dimension that related to the environmental and economic impacts of seaports.
Seaports have direct and indirect economic impacts on the region in which they operate; through their commercial activities they generate wealth and employment for a region while, in addition, they facilitate trade and indirectly contribute to economic wealth generation. The measures produced widely reflected this. The measures outlined the direct and indirect effects of seaports on employment and added value, trade facilitation and the spatial economic impact of ports on their regions. Input/output and computable general equilibrium models were the two most common methodologies employed to create these measures.
Seaports generate adverse environmental effects through regular activities. The OECD recognises effects such as air pollution, noise, water pollution stemming from ballast water handling, oil spills and antifouling of ships and waste from hazardous cargos; all of which potentially produce negative effects on their environs. As a result, there is a need to evaluate the environmental effect of seaports. Within the review, 11 studies were identified that measured the performance of seaports on environmental grounds, including the direct environmental impact of seaports as well as their spatial environmental impact.
In addition, the wider literature on seaport sustainability indicates that macro performance is increasingly important for strategic management of seaports. Dooms et al. (2015) provide a thorough literature review of port economic studies. They note that the amount of academic literature on the area is fairly limited while there is a proliferation of impact studies by the seaports themselves, most often published as part of development plans. Included in their review is a meta-analysis of 33 port studies in Belgium, the UK, France and North America. Likewise Puig et al. (2015) found that of the 79 ports surveyed in a 2013 ESPO survey, 90% of respondent ports stated that they have an environmental policy and 94% of ports have designated environmental personnel. These results represent an improvement from the last ESPO Environmental assessment survey. In this review, however, only articles published by independent bodies were considered for inclusion on the basis of impartiality.

Strategic Dimension
The final dimension identified comprises studies concerned with the performance of seaports at a strategic management level. Performance on this dimension is relates to the effects of strategic decisions on the performance of the seaports and how effective these decisions are on achieving strategic port goals. In particular, there is a large body of literature that examines the institutional arrangements in seaports, and in particular, the strength and effectiveness of the port governance in achieving the strategic goals for shareholders in terms of performance. This is largely attributable to the large-scale reform and restructuring in the seaport sector since the 1980s. While no agreed definition of governance exists, it generally refers to the rules and structures in place that govern managerial decisions and the scope of managerial autonomy of the port relative to their shareholders.
Further published case studies under this dimension indicate that no single best model for the port governance structure exists. The outcome of institutional reform has also been found to be path dependent on the local/national institutional frameworks and the political traditions in place (Ng and Pallis 2019;Notteboom et al. 2013). In comparison to the overall body of research, of the six studies, few studies of them empirically test the fit of the governance structure (Vieira et al. 2014,16). Analytically, these studies involved the creation of evaluation frameworks under which seaport reform and governance structure fit can be examined, while empirically performance is examined pre and post reform.
Other measures examined the effects of alternative strategic management decisions (García-Morales et al. 2015), and the effects of a centralised and decentralised regulation on seaport capacity, efficiency and tariffs (Zheng and Negenborn 2014). In addition to the studies identified within the literature on port efficiency, there are a number of studies that examine the effects of reform on efficiency levels in seaports. Figure 1 illustrates the dominant measures of operational performance across the reviewed literature. However, we observe that there has been a greater dispersion in the topic of measurement in recent years. Seaport performance measurement as a whole is increasingly multidimensional and, while this review does not address the validity of the various constructs of measurement created, the range and increased complexity of approaches toward measuring seaport performance is growing. This is consistent with the spatial and functional evolution of ports over time as observed and documented in the various models of port development such as Birds Anyport model, the Port Generations model, the WorkPort model and port regionalization. Ports are complex systems. They evolve over time and what constitutes performance similarly evolves over time. This is evident in the growth in the number of studies that examine measuring seaports' competitiveness of a ports' capabilities i.e through adding value to the logistics supply chain, a feature of so called fourth generation ports (Paixão and Marlow 2003). Similarly, as noted in Dooms et al. (2013), the composition of salient seaport stakeholder groups and the nature of such relationships can change over time. Again, this is evidenced in this review by the growth in studies measuring not only value added performance but also environmental performance. This corroborates the findings of Dooms et al (2013) where there was a noted increase in the importance of such issues amongst stakeholders as a port developed.

Figure 1-Profile of Reviewed Studies
In the face of evolving and multidimensional performance, there is a clear need to understand what performance is relevant at any given time in any given port. Furthermore, the findings of this review suggest that it is necessary to assess seaport performance across a number of dimensions to facilitate a more accurate appraisal of seaport performance, particularly when overall seaport performance and the needs of multiple seaport stakeholders are relevant.
To date however, there are a limited number of studies that incorporate more than one dimension in analysis.  recognize that port performance is multifaceted and propose a performance measurement model that incorporates performance across both efficiency and effectiveness dimensions. Shiau and Chang's (2015) case study on the sustainability of Keelung Port, proposed a number of indicators along environmental, economic, and social dimensions. The indicators were selected using a social construction of technology framework and involved interaction with multiple groups of stakeholders. Consequently, there are a number of studies that assess the effects of port performance on one dimension on another, for example governance structure on efficiency (Carvalho et al. 2010;Cheon et al. 2010), efficiency on trade facilitation (Doi, Tiwari and Itoh 2001) and efficiency on environmental performance (Chin and Low 2010). Despite this, mainstream seaport performance studies have tended to be unidimensional in their measurement scope. This is entirely valid and consistent with the objectives of the studies concerned; however, there has been little consideration given to evaluating performance from a multidimensional perspective. This is particularly relevant when such evaluation is required for effective state policy formulation to ensure that a seaport's development and strategy is consistent with national economic policy, so as to maximize overall national welfare. It is therefore worth examining how, in the context of performance measurement system design, a construct of performance can be created to incorporate a number of relevant performance dimensions.

DISCUSSION
As discussed in the introduction and highlighted throughout the findings of the review there is a need for policymakers to understand and contextualize port performance across multiple dimensions at one time. As highlighted in this review there has been significant advancement in the development of methods by which to create different measures of port performance. However, it is argued that there has been less progress in advancing means to define what constitutes performance as a construct particularly when performance is multidimensional. It is argued that there is a need to define port performance as a construct in a multidimensional setting. Performance itself is a latent construct and has to be measured by indicators that form an approximation to the value of the latent construct. A key distinction in measuring performance is the form in which this latent construct takes as it has strong implications for the form of the sub construct (if multileveled) and measures and indicators that are employed. The primary distinction is whether the construct is reflective or formative. Schellinck and Brooks (2015, 7) argue for the latter approach to measuring port performance in relation to effectiveness of port service delivery. The authors argue that relative to a reflective construct a formative construct "… provides the level of detail that exhaustively captures the relevant/causal criteria for overall port performance". It is argued for reasons explored below that this logic extends beyond effectiveness to port performance that is multidimensional.
In order to examine this (as demonstrated in Schellinck and Brooks 2015), it is necessary to compare formative and reflective constructs. A formative construct differs from a reflective one in terms of direction of causality from latent construct to measure of construct (or sub construct when the construct is multi-tiered). In reflective constructs, causality is deferred from latent construct to measure as such the construct causes, with the measures being the effects of the underlying construct. In reflective constructs, measures of the construct must be unidimensional -that is all measures must measure the underlying construct consistently with a change in the underlying construct causing a common change in variance amongst the measures. Measures must therefore be inter-correlated. Thus, in a reflective construct it is possible to add and remove measures and change the underlying latent construct. In formative constructs, the causality defers from measure to construct, as such the measures cause the construct. An important difference between the two is that in formative constructs there does not have to be inter-correlation between the measures. In contrast, to reflective measures a removal of an indicator may alter the construct being measured (Petter, Straub, and Rai 2007;Diamantopoulos, Riefler, and Roth 2008). A reflective measure implies an underlying latent construct that is fully formed and exists objectively.
It has been shown in many studies that what constitutes performance as a construct in a port setting is contingent on the perspective of the stakeholder concerned and how it is contextually defined (Schellinck and Brooks 2015, 7). This is evident if the different dimensions of seaport performance are examined in turn. Operationally, seaports need to meet the requirements of their user-firms that also provide services within seaports. As seaports provide services they need to meet the needs of customers. However, from the literature, it can be seen that the determinants of market demand for seaports services extend beyond the individual services provided by seaports, with seaports competing as parts of wider logistics and supply chains. Thus, commercially, seaports must perform to facilitate the execution of this supply chain. Considering the operational, customer and supply chain dimensions, seaports must therefore meet the needs of stakeholders ranging from the internal stakeholders involved in the provision of services, to the wider industry level partners who combine with seaport service providers to create effective logistics and supply chains.
On the macro dimension, a port's performance has implications for a much larger number of stakeholders. The unilateral effects of performance at macro level on the regions and nations suggest that seaports have welfare implications through the public goods they produce. Performance across the different dimensions can affect different stakeholders in different ways and to different degrees. The importance of different dimensions of port performance is subjective and contingent on the perspective of the port stakeholder. Reflective structures require an objective construct of performance that is unidimensional, with a relationship between sub-constructs and measures in that they share common antecedents and consequences. Port performance is multidimensional and subjective and its measures are not necessarily related. For example, for what causes strong competitiveness may not cause good community stakeholder relationships. It is therefore argued that a holistic measure of multidimensional performance requires a formative construct of performance.
To examine what this formative construct may look like, an example has been conceptualized and presented in Figure 2. Here, performance is shown as a function of the port's network and macro performance effects. The dimensions of performance that are chosen determine how performance is defined. It is clear that the measure proposed only represents a fraction of what could be measured in the context of seaport performance. If, for example, a third dimension such as customer satisfaction were added, it would essentially be a different construct of performance. Similarly, the first order sub-constructs are formative too; for example, if environmental impact was deemed to be unimportant the concept of macro performance would similarly change to a different construct.
When it comes to the third order however the constructs change from formative to reflective, possibly leading to a major effect on how performance is determined and measured. In contrast to the first and second order, the measures or indicators need to be co-linear and related as they are measuring the same thing. For example, the level of emissions does not necessarily tell you about the level of supply integration, therefore is not a good measure in a reflective sense. Similarly, at the reflective level, one could add or remove measures to improve the accuracy of the measure without fundamentally changing what one is measuring. For example, in terms of assessing the socio-economic impact of the port, one could add a spatial component to measure regional impacts yet still be measuring the socio-economic impact of the port, but with a potentially better measure.

POLICY IMPLICATIONS
A formative approach to performance measurement has a number of implications for policymakers. Firstly, it means that performance of a seaport or seaport system needs to be defined by the policymaker. For example, if reform is to improve efficiency and effectiveness then performance is determined by these two constructs. Similarly, the performance of a policy program to improve the macro-environmental performance would not be measured by looking at the port's profitability. When considering performance in relation to port policy and its implications, performance is a formative construct determined by the objectives of a policy program. Thus, the onus is on policymakers to develop clear definitions of what their expectations are for the performance of a seaport system within their policy remit. If the latter is to ensure that the public interest is preserved through the operation of public infrastructure, the port's performance must therefore be measured to secure that interest.
This, however, is not a simple task as identified by Vieira et al. (2015) and most recently by Brooks et al. (2017). There is a lack of causal understanding of what type of policy program works in which context. Much of this can attributed to the complexity of port systems. Driving this complexity, is that ports have diverse stakeholder base holding conflicting and diverse interests (e.g. terminal operators; local community and environmental groups) (Bekebrede and Mayer 2006). As a result, it is difficult to know how port performance in the port system should be assessment. On a related issue, Pilcher and Tseng (2017) pointed out that much of the difficulty in evaluating policy reform can be attributed to a number of factors. Firstly, they argue that ambiguity in interpreting the key terms of port reform can cause difficulty in interpreting performance; for example, efficiency and productivity can have a plurality of meanings. This is consistent with the findings in our review, as concepts such as efficiency ranged from tight definitions (such as Farrell's productive efficiency) to more loosely defined concepts of operational competency. Similarly, our review noted that competitiveness, effectiveness, accessibility and socio-economic impacts are often loosely defined. This is not a fate peculiar to the port industry and one only has to look at the debate surrounding the definition of quality in industrial production to see this is a common occurrence.
Perhaps, as identified by Langenus and Dooms (2015), where the port industry falls down is that international advisory bodies tasked to develop port performance metric systems have not reflected such contingency related factors in performance evaluation. In particular, the weaknesses lie in meso-level metrics, similar to the work done in port system evolution by the World Bank in previous decades. In contrast, in the more heavily regulated industries of pharmaceutical production, the goods manufacturing practice guides produced by international bodies provide a vital resource on which to base action to improve performance.
Secondly, as argued by Pilcher and Tseng (2017) semantic issues such as those above can be overcome by better definition. It is argued that the other factors identified, time, geography and context (which we argue are all contextual) pose the most challenge for policymakers 3 . Multidimensional formative constructs are context dependent (Schellinck and Brooks, 2015). As discussed above what constitutes performance is dependent on a host of factors including perception of the concerned stakeholders but also contextual and contingent related factors such as spatial (geopositional, institutional, market etc.) and temporal (lifecycle, environmental dynamics, societal attitudes etc.). In addition, port performance often involves different levels of performance such as performance at the cluster versus performance at the individual actor level (terminal operator, customs or port authority etc.). The organizational complexity of seaports presents an added level of difficulty in determining how performance is defined. If performance is taken to be what represents the public interest and is to be a formative construct, it must therefore be defined in terms of what constitutes the public interest. Further, it is necessary to have an understanding of how spatial and temporal factors affect port performance and how policy measures are likely to affect performance as well. A lack of a clear definition of performance in a given context poses a serious limitation to the measurement and subsequent development of port policy. Much of the literature surveyed in this review details ways of measuring performance effects; however, it is argued that there still remains lack of rigorous attempts to attribute this to the measures of port performance.
For policymakers to make evidence-based interventions on port performance measures, as well as being able to assess changes in the level of a port's performance, it is also necessary to be able to interpret causal factors related to such changes. A truly holistic multidimensional performance measurement system should be based on a causal understanding of different levels of performance. This is firmly accepted in the wider literature on performance measurement, acknowledging that the identification of performance dimensions and their inter-relationships are a crucial first step in the design of performance measurement systems (Suwignjo, Bititci, and Carrie 2000). It could be suggested that the strategic management literature offers some insights. For example, the balanced scorecard performance measurement tool, commonly used to assess organizational performance, provides a strategic performance measurement mapping tool Norton 2000, 2001). However, the inability to map complexities and contextual diversity of port settings present a severe limitation. It is argued, therefore, that the issue in performance measurement in the port setting is not necessarily a problem with creating measures of performance (of which there are multiple); it is a problem of defining what performance is in a given context as this rests on an understanding of causality which is underexplored. In terms of this review, it is argued that the gap in the literature comes not in our understanding of how to measure performance but in our understanding of what to measure in a given context.

CONCLUSIONS AND FUTURE RESEARCH IMPLICATIONS
The objective of this study was to conduct a systematic literature review of published studies on seaport performance measurement to identify, critically evaluate and integrate the various dimensions of seaport performance measurement. The systematic approach aimed to minimize researcher bias with subjective decision making limited to decisions on search strategy and selection criteria. Further, it maximizes coverage of studies reviewed, however, should not be treated as complete due to the possible omission of some studies. For example, a limitation of this review is that it excludes macro impact performance studies on a seaport's spatial impact on their urban environment and integration in port cities; this may be attributable to decisions taken in creating search strategy. With the exception of a limited number of studies (such as Woo and Pettit 2011), most of the literature on seaport performance has focused on unidimensional performance. Notwithstanding these limitations, this review provides a novel categorization of the literature that reflects the multidimensional nature of seaport performance. In line with the views of , seaport performance is multifaceted. We argue that when policy formation affects multiple stakeholders, it is necessary to assess seaport performance across a number of dimensions. The multidimensional nature of seaport performance necessitates a formative rather than a reflective approach to port performance measurement. From a policy perspective, this requires an identification of the components of port performance upon which policy program depends. This study suggests that a limited understanding of how policy impacts performance in a seaport setting seriously can limit the capability of policymakers to do this (Vieria et al. 2013;Brooks et al. 2016). We therefore conclude that future research needs to extend its focus on what causes performance in the seaport setting rather than simply focusing on the creation of seaport performance measures.