The pyqrse API reference¶
QRSEModel¶
-
class
pyqrse.model.model.QRSEModel(kernel='S', data=None, params=None, i_ticks=1000, i_stds=10, i_bounds=(-10, 10), about_data='', norm_data=False, **kwargs) The primary model container for working with QRSE Models
QRSEModel can be instantiated as either QRSEModel() or as QRSE(). When working in a jupyter notebook, the shorter version is generally preferrable. However, when scripting QRSEModel is preferred.
When instantiated the QRSEModel attempts to find the appropriate bounds of integration, self.i_bounds, by trying the following three methods in order:
- Use the sufficient statistics of the data.
2. Identify the meaningful support of the kernel using the parameter values provided.
3. If there is no data or parameter provided the model will use defaults from the kernel
Parameters: kernel (str, object) – can either be a kernel code or QRSE kernel class object, which includes any
QRSE kernel. The default kernel is the SQRSEKernel. Available kernels can be seen by running:>>>pyqrse.available_kernels()data (np.array or str or None) – If 1d np.array, will set self.data to that array. If str, will load data as ‘path/to/data.cvs. If None, will use params to instantiate the model. When given a string loads data using the pandas.read_csv module. Depending on the format of the data, it may be necessary to use pandas.read_csv keywords.
params (np.array or None) – must be an np.array of the appropriate length.
i_ticks (int) – the number of ticks in grid of integration values. default is 1000.
i_stds (int) – the number of data standard deviations to i_bounds to. default is 10.
about_data (str) – saves notes about the data to the self.notes dictionary.
norm_data (book) – if True, will normalize data. If data is normalized data sufficient statistics can be accessed at self.data_suff_stats
kwargs – optional keyword arguments for pandas.read_csv and
~
- If the model is instantiated without data, data should only be added with
- using the
add_datamethod
- If the model is instantiated without data or params, model parameter
values should be added using the
self.setup_from_paramsmethod.-
action_entropy(x) Entropy of conditional action distribution
H(p(a|x)) = SUM p(a_i|x) for i=1,2 (binary) (i=1,2,3 for ternary)
action entropy is evaluated using self.params
Parameters: x (float or np.array([float]) – value of data being tested Returns: float or np.array([float])
-
actions action set
-
add_data(data, index_col=0, header=None, squeeze=True, silent=False, save_abs_path=False, norm_data=False, **kwargs) Primary means of adding data to model. It will set integration defaults according to the shape of the data.
Parameters: - data – either pandas.Series, np.ndarray, or “path/to/data”
- index_col – pandas.read_csv keyword argument
- header – pandas.read_csv keyword argument
- squeeze – pandas.read_csv keyword argument
- silent – no printing while running. default is False
- save_abs_path – if True will save absolute instead of relative path to the data useful if saving object to different location
- norm_data – True or False to normalize data
- kwargs – keyword arguments for pandas.read_csv
Returns:
-
aic(count_xi=True) Akaike information criterion
Parameters: count_xi (bool) – default is True. Count xi in the parameter count if kernel uses xi Returns: aic of model given data and parameter values
-
aicc(count_xi=True) Returns: aicc of model given data
-
bic(count_xi=True) Returns: aicc of model given data
-
code QRSEModel Identification code for the Kernel
-
entropy(etype='joint') Entropy of the QRSEModel
Will find the joint H(x, a), conditional H(x|a), or marginal entropy H(x). Note that conditional entropy H(x|a) is different from the entropy of the conditional distribution at some value x, H(p(a|x)).
To find H(p(a|x)) use the .action_entropy method:
Parameters: etype (str or list(str)) – type of entropy returned. The options are ‘joint’, ‘cond’, or ‘marg’. If entered as as a list, will return an array of entropy values Returns: float or np.array of floats
-
evidence(data=None) applies pdf to self.data :param data: :return:
-
fit(**kwargs) Parameters: - data –
- params0 –
- summary –
- save –
- use_jac –
- weights –
- hist –
- check –
- silent –
- use_hess –
- smart_p0 –
- use_sp –
- kwargs – see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
Returns:
-
fitter= None controls model fitting for QRSEModel
allows fitting via Kullbeck-Leibler Distance Minimization and maximum likelihood estimation see pyqrse.fittools.optimizer.QRSEFitter
-
fn_params Number of parameters in the model including xi
if the model uses xi, add one to n_params
Returns: full number of of parameters
-
fparams full parameter values
Appends xi to params if the kernel uses xi
Returns: np.array(float) - params.append(xi)
-
fpnames Full list of parameter names including xi
Appends xi to pnames if the kernel uses xi
Returns: np.array(float) - pnames.append(xi)
-
fpnames_latex full latex formatted parameter names
Appends xi to pnames_latex if the kernel uses xi
Returns: pnames_latex.append(xi)
-
classmethod
from_pickle(path_to_pickle, trust_check=False, **kwargs) !!!DO NOT RUN THIS FUNCTION UNLESS YOU TRUST THE SOURCE WITH ABSOLUTE CERTAINTY!!!
Pickling is extremely convenient from a workflow perspective, as you can save the results of inquiries and instantly load them back into your python environment.
However, there are no safety checks on the code that will be run. That means:
-If you don’t trust the source, don’t unpickle it.
-Python will run all code in the pickle malicious or not!
Parameters: - path_to_pickle – individual or list of paths to saved pickled QRSE objects
- args –
- kwargs –
- trust_check – prompts the user to verify that they trust the source of the file to be unpickled default value is True
Returns:
-
hess_fun(params) Value of the Hessian of the negative log likelihood
Parameters: params (np.array(float)) – model parameters Returns: np.array(float)(2d) value of Hessian at params
-
hess_inv_fun(params=None) Inverse Hessian of the negative log likelihood
Parameters: params (np.array(float)) – model parameters Returns: np.array(float)(2d) value of Inverse Hessian at params
-
history()
-
i_bounds (min, max) of bounds of integration
-
indifference(actions=(0, 1)) Returns: x s.t. p(buy | x) = p(sell | x)
-
jac_fun(params=None) Value of the jacobian of the negative log likelihood
Parameters: params (np.array(float)) – model parameters Returns: np.array(float) value of jacobian at params
-
kernel_fun(x) value unnormalized kernel function
kernel = exp(potential + entropy)
evaluated at self.params
Parameters: x (float or np.array([float]) – value of data being tested Returns: float or np.array([float])
-
log_kernel(x) Log of the unnormalized kernel function
log_kernel = potential + entropy
evaluated at self.params
Parameters: x (float or np.array([float]) – value of data being tested Returns: float or np.array([float])
-
log_partition(params=None) evaluate the log of the QRSE partition function numerically
Parameters: params (np.array) – if params are None, will use self.params, otherwise will evaluate at params. Returns: the value of the log partition function (float)
-
log_prior(params) The log of the prior distribution of parameter values
Used in for fitting the model to data. By default returns 0., which is equivalent having no prior.
Can be overridden for an individual QRSEModel instance as follows:
Instantiate the instance of the QRSEModel:
qrse1 = QRSEModel('AT', data=data) # or qrse1 = QRSE('SF', data=data)
Define a new function for the prior:
def new_log_prior(params): # self is not included like in normal methods # params will be a 1d np.array the same length as n_params # prior hyper parameters should be hardcoded into function hyper_parameters = [hyper_parameter_0, hyper_parameter_1, hyper_parameter_2] # output of prior function should be negative to # penalize likelihood function negative_squared_loss = -(params - hyper_parameters)**2 return np.sum(negative_squared_loss)
Redefine the instance method to be the new function:
qrse1.log_prior = new_log_prior
It is generally advised to change log_prior at the instance level. Changing it at the class level i.e:
QRSEModel.log_prior = new_log_prior
or
QRSE.log_prior = new_log_prior
will NOT work as intended.
Also, see
pyqrse.model.model.QRSEModel.set_log_prior()Parameters: params (np.array) – parameter values to evaluate Returns: float
-
log_prob(*args, **kwargs) log probability = -1 * ( negative log likelihood )
- nll(self, data=None, params=None, weights=None, use_sp=False)
Parameters: - data –
- params –
- weights –
- use_sp –
Returns:
-
logits(x, params=None) Parameters: - x –
- params –
Returns:
-
long_name= None longer name of the model by default name is set of the full name of kernel name (i.e. Symmetric-QRSE)
-
mean(use_sp=True) Mean of the QRSE distribution
Use_sp: if False (default) will find optimum over grid of ticks if True will use scipy integrate/maximize Returns: estimated mean of the distribution
-
mode(use_sp=True) Mode of the QRSE distribution
Use_sp: if False will find optimum over grid of ticks if True (default) will use scipy integrate/maximize Returns: estimated mode of the distribution
-
n_actions
-
n_params number of free model parameters. does not include xi.
-
name= None name of the model by default name is set of the abbreviated kernel name (i.e. S-QRSE)
-
nll(data=None, params=None, weights=None, use_sp=False) value of the negative log likelihood function
Parameters: - data –
- params –
- weights –
- use_sp –
Returns:
-
notes= None A ‘notes’ dictionary for the model Conveniently track things to remember about results. Especially useful pickling the object
-
params np.array of parameter values
-
partition(params=None, use_sp=False) evaluate the QRSE partition function numerically
Parameters: - params (np.array) – if params are None, will use self.params, otherwise will evaluate at params.
- use_sp (bool) – If True, will evaluate using scipy.integrate.quad over the range self.i_bounds. If False (default), will evalaute over a grid values by summing the value of the log_kernel over the grid adjusting for step size and taking exp(log_part). The grid method (False) tends to be quicker and generally the loss of precision numerical integration in negligible.
Returns: the value of the partition function (float)
-
pdf(x, params=None) Probability Density Function :param x: input value or np.array of values :param params: by default self.params otherwise use values as input :return: pdf value at x or an np.array([…]) of pdf values
-
plot(**kwargs) - plot(self, which=0, params=None, bounds=None,
- ticks=1000, showdata=True, bins=20, title=None, dcolor=’w’, seaborn=True, lw=2, pi=1., colors=None, color_order=None, show_legend=True):
Parameters: - which –
- params –
- bounds –
- ticks –
- showdata –
- bins –
- title –
- dcolor –
- seaborn –
- lw –
- pi –
- colors –
- color_order –
- show_legend –
Returns:
-
plotboth(**kwargs) plot marginal distribution side by side with :param args: :param kwargs: :return:
-
plotter= None controls plotting for QRSEModel
see pyqrse.utilitities.plottools.QRSEPlotter
-
pnames parameter names
-
pnames_latex latex formatted parameter names
-
potential(x) potential function of the kernel
Parameters: x (float or np.array([float]) – value of data being tested Returns: float or np.array([float])
-
res
-
reset_history()
-
rvs(n=1, bounds=None) random variable sampler using the interpolated inverse cdf method
rvs works as follows:
- Creates a grid approximation of pdf based on bounds.
- Estimates the cdf using this grid.
- Interpolates the inverse cdf using sp.interpolate
- Samples from uniform(0,1) distribution
- Enters uniform samples into inverse cdf function
Parameters: - n (int) – number of samples. must be either a positive integer. default is 1, which return a single sample. If n > 1, returns an n length np.array of samples
- bounds (list, tuple or None) – [-10, 10, 10000] / (-10, 10, 10000) create 10000 ticks between -10 and 10 as an estimate of the pdf of the Model. If None (default), will will use predetermined ticks based on bounds of integration bounds is preset to None and generally won’t need to be adjusted
Returns: float or np.array([float])
-
sampler= None controls mcmc sampling for QRSEModel see pyqrse.fittools.sampling.QRSESampler
-
save_history(new_hist=None)
-
set_hess_inv(from_res=False) Set model inverse Hessian
primarily used to find the inverse Hessian for Sampling
Parameters: from_res (bool) – If False, (default) will find use Autograd to the Hessian. If True, will use the estimated value from the .fit() optimization routine
-
set_log_prior(new_log_prior) sets new log prior function so that it can access ‘self’
Alternative setter for
pyqrse.model.model.QRSEModel.log_prior(). If accessing ‘self’ isn’t necessary, follow the instructions for that method.Parameters: new_log_prior (function) – new prior function must of the form:
def new_log_prior(self, params): # complicated mathematics that # use params in addition to # self.attributes and/or self.methods return float_value_of_log_prior
- params must be an np.array of the appropriate length
-
set_params(new_params, use_sp=True) updates params and allows choice if self.partition uses sp or ticks
An alternative to using the self.params.setter that allows for choice of integration method
Parameters: - new_params (tuple, list, or np.ndarray) – new parameter values. Must be same length as params and should not include xi
- use_sp (bool) – If True (default), solve partition function using scipy.integrate.quad. If False, will use ticks. When set to False, is equivalent to using self.params = new_params
-
setup_from_params(parameters, start=2, imax=100, minmax=(2e-07, 4.5e-05), find_mode=True, stds=None) Find bounds of integration for a given parameterization
Will attempt to set model wide variables appropriate to model given parameters. does a binary search over the kernel values to find the points whose value is between the minmax bounds.
This function will not guarantee results when the starting point is not the mode of the kernel or if the functions is not monotonically decreasing away from the mode.
This function is only necessary when working without data since bounds of integration can be inferred from the data.
Parameters: - parameters (tuple, list, or np.array) – parameter values to initialize the model
- start (int) - uses that index from params | (float) – that value | (else) - 0.
- imax (int) – maximum number of steps before quitting search
- minmax (tuple) – min, max values of kernel, by default searches for the range (2e-07, 4.5e-05)
- find_mode (bool) – If True, searches for and begins from mode (default True)
-
std(use_sp=True) Standard Deviation of the QRSE distribution
Use_sp: if False (default) will find optimum over grid of ticks if True will use scipy integrate/maximize Returns: estimated standard deviation of the distribution
-
to_pickle(path_to_pickle) Uses the python pickle module to serialize the object (pickle it). Pickling allows it to be saved and reloaded later.
Parameters: path_to_pickle – Returns:
-
update_p0(data, weights=None, i_std=7)
-
use_xi does the model use xi?
-
xi mean of the data
QRSESampler¶
-
class
pyqrse.fittools.sampling.QRSESampler(qrse_model, chain_format='df') sampler doc_string
-
a_rates acceptance rates for the sampler
-
chain
-
getdiff(parameter1, parameter2) Get the difference between the chains of two parameters
Parameters: - parameter1 – string name for p1 (i.e. ‘t_buy’)
- parameter2 – string name for p2 (i.e. ‘t_sell’)
Returns: np.ndarray
-
init(*args, **kwargs) updates sampler with recent activity of the QRSEModel() :return:
-
marg_like
-
max_like()
-
max_params
-
mcmc(N=1000, burn=0, single=False, ptype='corr', s=1.0, update_hess=False, new=False, use_tqdm=True) Parameters: - N –
- burn –
- single –
- ptype –
- s –
- update_hess –
- new –
Returns:
-
n_errors
-
next(sample_fun='joint', **kwargs)
-
plot(per_row=2, figsize=(12, 4), use_latex=True) plot(self, per_row=2, figsize=(12, 4)): :param per_row: :param figsize: :return:
-
plotdiff(parameter1, parameter2, kind='hist', use_latex=True, figsize=None, **kwargs) Quickly view the difference between the chains of two parameters
Parameters: - parameter1 – string name for p1 (i.e. ‘t_buy’)
- parameter2 – string name for p2 (i.e. ‘t_sell’)
- kind – ‘hist’ for histogram or ‘line’ for time-series
- use_latex – use latex version of parameter names. default is True
- figsize – invokes plt.figure(figsize=figsize).
- kwargs – additional arguments for sns.distplot() and plt.plot()
Returns:
-
propose_new(params=None, ptype='corr', s=1.0)
-
set_params()
-
QRSEFitter¶
-
class
pyqrse.fittools.optimizer.QRSEFitter(the_model) -
fit(data=None, params0=None, summary=False, save=True, use_jac=True, weights=None, hist=False, check=False, silent=True, use_hess=False, smart_p0=True, use_sp=True, **kwargs) Parameters: - data –
- params0 –
- summary –
- save –
- use_jac –
- weights –
- hist –
- check –
- silent –
- use_hess –
- smart_p0 –
- use_sp –
- kwargs – see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
Returns:
-
history()
-
kld(params=None, target=None) Parameters: - params –
- target –
Returns:
-
klmin(target=None, save=True, use_jac=True, **kwargs) Parameters: - target –
- save –
- use_jac –
- kwargs –
Returns:
-
reset_history()
-
save_history(new_hist=None)
-
set_kl_target(target) Parameters: target – Returns:
-
update_model()
-
QRSEPlotter¶
-
class
pyqrse.utilities.plottools.QRSEPlotter(qrse_object, colors=None, color_order=None) THIS IS SWEET AT MAKING CHARTS
-
colors
-
plot(which=0, params=None, bounds=None, ticks=1000, showdata=True, bins=20, title=None, dcolor='w', seaborn=True, lw=2, pi=1.0, colors=None, color_order=None, show_legend=True) - plot(self, which=0, params=None, bounds=None,
- ticks=1000, showdata=True, bins=20, title=None, dcolor=’w’, seaborn=True, lw=2, pi=1., colors=None, color_order=None, show_legend=True):
Parameters: - which –
- params –
- bounds –
- ticks –
- showdata –
- bins –
- title –
- dcolor –
- seaborn –
- lw –
- pi –
- colors –
- color_order –
- show_legend –
Returns:
-
plotboth(*args, **kwargs) plot marginal distribution side by side with :param args: :param kwargs: :return:
-
set_color_order(color_order=None, output=False)
-
set_colors(colors=None, output=False)
-