In robotics, elementary behaviour patterns often tackle control theoretic problems. Because of incomplete or imprecise models of the control system, the structure and the parameters of a control policy are unknown. These problems can be solved by reinforcement learning algorithms like policy gradient methods. In this thesis, policy gradient learning is used to optimise a controller represented as a z-transformed rational function. This representation facilitates simultaneous optimisation of the control structure and its parameters in time space. The resulting controller can be analysed in terms of control theory to predict the control behaviour for arbitrary scenarios. Because the performance of gradient descent algorithms heavily depends on appropriate starting points, these parameters must be chosen carefully. This work presents a method that allows learning of an initial parameter set with the help of a single demonstrated trajectory. The approach is evaluated on a cartpole simulation for demonstrating the expressiveness of the policy. We also describe how to stabilise the gradient descent by introducing a linearisation term. Furthermore, a real soccer robot scenario demonstrates the ability of the proposed approach to deal with noisy scenarios. This illustrates the flexibility and adaptability of the approach for different problems with only little initial knowledge. A discussion of open questions and concluding remarks finally motivate future work and possible extensions of the proposed approach.