Image denoising has a profound impact on the precision of estimated parameters in diffusion kurtosis imaging (DKI). This work first proposes an approach to constructing a DKI phantom that can be used to evaluate the performance of denoising algorithms in regard to their abilities of improving the reliability of DKI parameter estimation. The phantom was constructed from a real DKI dataset of a human brain, and the pipeline used to construct the phantom consists of diffusion-weighted (DW) image filtering, diffusion and kurtosis tensor regularization, and DW image reconstruction. The phantom preserves the image structure while minimizing image noise, and thus can be used as ground truth in the evaluation. Second, we used the phantom to evaluate three representative algorithms of non-local means (NLM). Results showed that one scheme of vector-based NLM, which uses DWI data with redundant information acquired at different b-values, produced the most reliable estimation of DKI parameters in terms of Mean Square Error (MSE), Bias and standard deviation (Std). The result of the comparison based on the phantom was consistent with those based on real datasets.