Our results show that the use of the Dutch knowledge multiple choice test is feasible in a neighboring country. Firstly, more than 80 percent of the Dutch items were considered appropriate in Flanders. This suggests that the core of the knowledge base of general practice is much the same in these two countries. Secondly, although we had to overcome some resistance among the GP trainers, the great majority of Flemish participants liked the test and thought the items of the test represent core knowledge of the discipline. The major critique was that some questions were posed in “too Dutch” language for the Flemish participants. This suggests that the review of the items was performed by senior clinicians who understood the Dutch wording, whereas the Belgium (Flanders) students did not grasp the entire meaning of some words in the items. Although the citizens of Flanders and the Netherlands speak almost the same language, in future we need to attach importance of sensitivity for the technical idioms of each language. When expanding efforts to countries with another language this issue needs even more consideration but evidence from the undergraduate curriculum suggests that in the majority of cases only a small amount of work is needed to adapt questions
Can this bilateral collaboration be sustained? At present, the Dutch test is being introduced as a formative instrument and will be offered to all trainees and teachers in general practice in Belgium (Flanders). More experience with the procedures and results will probably help to overcome initial resistance to 'foreign' test formats. There needs to be more input from the Flemish teachers in the process of item banking in future cooperation. This academic commitment would help to establish a sense of ownership. Although Dutch is the language in both countries, some idioms are not readily transferable and this needs attention in the future. As Dutch is a minority language, the discussion of language should, in the broader European context, also deal with the possibility to select the English language, but this would also pose similar problems in the use of idioms.
We used aggregated group scores and we did not report scores per ICPC domain because the number of participants was too small to give meaningful evidence. The aggregated Flemish GP-students and GP-trainers scored worse than their Dutch counterparts. This key finding must be interpreted with care and needs further work. Dutch GP trainees are a few years older than their Flemish counterparts and may have had more clinical experience. Dutch GP trainees tend to work between graduation and starting vocational training.
In the present study, the Flemish students only had a brief preparation period of more or less a month and we have no data as to whether they checked the available guidelines. Furthermore, the presentation of a test format they are not used to, the differences in the GP-training program, and sometimes, typical Dutch idioms, could explain their results. Finally, we only applied one knowledge test and this hampers interpretation of the observed differences between Dutch and Belgium (Flanders) participants.
Individual scores cannot be used for high stake decisions like pass or fail for the Flemish participants. One finding, however, suggests construct validity for examination purposes for the test in Flanders. In one University (University Two) in year 1 trainees were urged to study the abstracts of the National guidelines and subsequently they could earn a credit. Their scores were significantly higher than the scores of other Flemish students. Their scores were significantly better than the other Flemish students of that year. This finding reassures us to some extend that the content of the test covers the guidelines and that actively learning these does improve the score. However, this subgroup only consisted of 23 students and therefore this finding should be interpreted carefully. To explore this further, we should look at concurrent validity with other existing test formats in Flanders, such as the knowledge test of Flanders, the Objective Clinical Examination and performance indicators at the training practices.
In the Netherlands we used routine data that were collected in a real examination context. In Belgium (Flanders) we relied on voluntary participants. This explains the lower share of participants per sub group that participated in Belgium (Flanders). Furthermore, we do not know if the scores of these participants are representative for their group.
The educational benefits of collaborative assessment are potentially huge. Firstly, if countries with comparable health systems and scope of general practice collaborate it would increase efficiency of assessment procedures. Costs of data banking and analysis, quality and quantity of production of test items could be considered a valuable asset for participating training centers. This can help to overcome the challenge of renewing items and prevent the item banks becoming outdated.
Secondly, test results can be used to assess progress of trainees in general practice. Kramer et all were able tp show increase of provinciency during their training period using a written test to assess knowledge
[22, 23]. There is already a body of evidence in the undergraduate curriculum underpinning the formative and summative use of knowledge tests over time
. Norman et all argue that feedback on performance steers learning behavior at the undergraduate level
. Evidence of longitudinal educational effects of international use in post-graduate training, however, is non existent at present.
Thirdly, international benchmarking of educational programs using aggregated group scores is an interesting avenue. Between medical schools in the undergraduate curriculum this was shown to be feasible. Benchmarking can inform course designers to improve their programs
. For this, the test formats need to be well accepted by course designers, teachers and students alike. Future work can establish the common ground if countries with different health systems collaborate. The example of the technical specialty of radiotherapy can be of guidance
In this study, the costs for the donor country encompass about 4 full time equivalent (FTE) faculty time and 1,5 FTE technical assistance. Up to present, the initial costs for Flanders were implicit. Only one part-time junior researcher worked part time on the project for 6 months with some academic supervision. If the project is prolonged, the Flemish Inter-university consortium needs to put more effort into item construction and this would lead to more academic costs. It can be expected that this will be very much less than setting up a high quality Flemish country-specific system.