pyqrse.model package¶

Submodules¶

pyqrse.model.model module¶

class pyqrse.model.model.QRSEModel(kernel='S', data=None, params=None, i_ticks=1000, i_stds=10, i_bounds=(-10, 10), about_data='', norm_data=False, **kwargs)¶

Bases: pyqrse.utilities.mixins.HistoryMixin, pyqrse.utilities.mixins.PickleMixin

The primary model container for working with QRSE Models

QRSEModel can be instantiated as either QRSEModel() or as QRSE(). When working in a jupyter notebook, the shorter version is generally preferrable. However, when scripting QRSEModel is preferred.

When instantiated the QRSEModel attempts to find the appropriate bounds of integration, self.i_bounds, by trying the following three methods in order:

Use the sufficient statistics of the data.

2. Identify the meaningful support of the kernel using the parameter values provided.

3. If there is no data or parameter provided the model will use defaults from the kernel

Parameters:

kernel (str, object) – can either be a kernel code or QRSE kernel class object, which includes any QRSE kernel. The default kernel is the SQRSEKernel. Available kernels can be seen by running:
```
>>>pyqrse.available_kernels()
```
data (np.array or str or None) – If 1d np.array, will set self.data to that array. If str, will load data as ‘path/to/data.cvs. If None, will use params to instantiate the model. When given a string loads data using the pandas.read_csv module. Depending on the format of the data, it may be necessary to use pandas.read_csv keywords.
params (np.array or None) – must be an np.array of the appropriate length.
i_ticks (int) – the number of ticks in grid of integration values. default is 1000.
i_stds (int) – the number of data standard deviations to i_bounds to. default is 10.
about_data (str) – saves notes about the data to the self.notes dictionary.
norm_data (book) – if True, will normalize data. If data is normalized data sufficient statistics can be accessed at self.data_suff_stats
kwargs – optional keyword arguments for pandas.read_csv and ~

If the model is instantiated without data, data should only be added with

using the add_data method
If the model is instantiated without data or params, model parameter

values should be added using the self.setup_from_params method.

action_entropy(x)¶

Entropy of conditional action distribution

H(p(a|x)) = SUM p(a_i|x) for i=1,2 (binary) (i=1,2,3 for ternary)

action entropy is evaluated using self.params

Parameters:	x (float or np.array([float]) – value of data being tested
Returns:	float or np.array([float])

actions¶: action set

add_data(data, index_col=0, header=None, squeeze=True, silent=False, save_abs_path=False, norm_data=False, **kwargs)¶

Primary means of adding data to model. It will set integration defaults according to the shape of the data.

Parameters:

data – either pandas.Series, np.ndarray, or “path/to/data”
index_col – pandas.read_csv keyword argument
header – pandas.read_csv keyword argument
squeeze – pandas.read_csv keyword argument
silent – no printing while running. default is False
save_abs_path – if True will save absolute instead of relative path to the data useful if saving object to different location
norm_data – True or False to normalize data
kwargs – keyword arguments for pandas.read_csv

Returns:

aic(count_xi=True)¶

Akaike information criterion

Parameters:	count_xi (bool) – default is True. Count xi in the parameter count if kernel uses xi
Returns:	aic of model given data and parameter values

aicc(count_xi=True)¶

Returns:	aicc of model given data

bic(count_xi=True)¶

Returns:	aicc of model given data

code¶: QRSEModel Identification code for the Kernel

entropy(etype='joint')¶

Entropy of the QRSEModel

Will find the joint H(x, a), conditional H(x|a), or marginal entropy H(x). Note that conditional entropy H(x|a) is different from the entropy of the conditional distribution at some value x, H(p(a|x)).

To find H(p(a|x)) use the .action_entropy method:

pyqrse.model.model.QRSEModel.action_entropy()

Parameters:	etype (str or list(str)) – type of entropy returned. The options are ‘joint’, ‘cond’, or ‘marg’. If entered as as a list, will return an array of entropy values
Returns:	float or np.array of floats

evidence(data=None)¶: applies pdf to self.data :param data: :return:

fit(**kwargs)¶

Parameters:	data – params0 – summary – save – use_jac – weights – hist – check – silent – use_hess – smart_p0 – use_sp – kwargs – see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
Returns:

fitter = None¶

controls model fitting for QRSEModel

allows fitting via Kullbeck-Leibler Distance Minimization and maximum likelihood estimation see pyqrse.fittools.optimizer.QRSEFitter

fn_params¶

Number of parameters in the model including xi

if the model uses xi, add one to n_params

Returns:	full number of of parameters

fparams¶

full parameter values

Appends xi to params if the kernel uses xi

Returns:	np.array(float) - params.append(xi)

fpnames¶

Full list of parameter names including xi

Appends xi to pnames if the kernel uses xi

Returns:	np.array(float) - pnames.append(xi)

fpnames_latex¶

full latex formatted parameter names

Appends xi to pnames_latex if the kernel uses xi

Returns:	pnames_latex.append(xi)

hess_fun(params)¶

Value of the Hessian of the negative log likelihood

Parameters:	params (np.array(float)) – model parameters
Returns:	np.array(float)(2d) value of Hessian at params

hess_inv_fun(params=None)¶

Inverse Hessian of the negative log likelihood

Parameters:	params (np.array(float)) – model parameters
Returns:	np.array(float)(2d) value of Inverse Hessian at params

i_bounds¶: (min, max) of bounds of integration

indifference(actions=(0, 1))¶

Returns:	x s.t. p(buy \| x) = p(sell \| x)

jac_fun(params=None)¶

Value of the jacobian of the negative log likelihood

Parameters:	params (np.array(float)) – model parameters
Returns:	np.array(float) value of jacobian at params

kernel_fun(x)¶

value unnormalized kernel function

kernel = exp(potential + entropy)

evaluated at self.params

Parameters:	x (float or np.array([float]) – value of data being tested
Returns:	float or np.array([float])

log_kernel(x)¶

Log of the unnormalized kernel function

log_kernel = potential + entropy

evaluated at self.params

Parameters:	x (float or np.array([float]) – value of data being tested
Returns:	float or np.array([float])

log_partition(params=None)¶

evaluate the log of the QRSE partition function numerically

Parameters:	params (np.array) – if params are None, will use self.params, otherwise will evaluate at params.
Returns:	the value of the log partition function (float)

log_prior(params)¶

The log of the prior distribution of parameter values

Used in for fitting the model to data. By default returns 0., which is equivalent having no prior.

Can be overridden for an individual QRSEModel instance as follows:

Instantiate the instance of the QRSEModel:

qrse1 = QRSEModel('AT', data=data)

# or

qrse1 = QRSE('SF', data=data)

Define a new function for the prior:

def new_log_prior(params):

    # self is not included like in normal methods
    # params will be a 1d np.array the same length as n_params


    # prior hyper parameters should be hardcoded into function

    hyper_parameters = [hyper_parameter_0,
                        hyper_parameter_1,
                        hyper_parameter_2]

    # output of prior function should be negative to
    # penalize likelihood function

    negative_squared_loss = -(params - hyper_parameters)**2
    return np.sum(negative_squared_loss)

Redefine the instance method to be the new function:
```
qrse1.log_prior = new_log_prior
```

It is generally advised to change log_prior at the instance level. Changing it at the class level i.e:

QRSEModel.log_prior = new_log_prior

or

QRSE.log_prior = new_log_prior

will NOT work as intended.

Also, see pyqrse.model.model.QRSEModel.set_log_prior()

Parameters:	params (np.array) – parameter values to evaluate
Returns:	float

log_prob(*args, **kwargs)¶

log probability = -1 * ( negative log likelihood )

nll(self, data=None, params=None, weights=None, use_sp=False)

Parameters:	data – params – weights – use_sp –
Returns:

logits(x, params=None)¶

Parameters:	x – params –
Returns:

long_name = None¶: longer name of the model by default name is set of the full name of kernel name (i.e. Symmetric-QRSE)

mean(use_sp=True)¶

Mean of the QRSE distribution

Use_sp:	if False (default) will find optimum over grid of ticks if True will use scipy integrate/maximize
Returns:	estimated mean of the distribution

mode(use_sp=True)¶

Mode of the QRSE distribution

Use_sp:	if False will find optimum over grid of ticks if True (default) will use scipy integrate/maximize
Returns:	estimated mode of the distribution

n_actions¶

n_params¶: number of free model parameters. does not include xi.

name = None¶: name of the model by default name is set of the abbreviated kernel name (i.e. S-QRSE)

nll(data=None, params=None, weights=None, use_sp=False)¶

value of the negative log likelihood function

Parameters:	data – params – weights – use_sp –
Returns:

notes = None¶: A ‘notes’ dictionary for the model Conveniently track things to remember about results. Especially useful pickling the object

params¶: np.array of parameter values

partition(params=None, use_sp=False)¶

evaluate the QRSE partition function numerically

Parameters:

params (np.array) – if params are None, will use self.params, otherwise will evaluate at params.
use_sp (bool) – If True, will evaluate using scipy.integrate.quad over the range self.i_bounds. If False (default), will evalaute over a grid values by summing the value of the log_kernel over the grid adjusting for step size and taking exp(log_part). The grid method (False) tends to be quicker and generally the loss of precision numerical integration in negligible.

Returns:

the value of the partition function (float)

pdf(x, params=None)¶: Probability Density Function :param x: input value or np.array of values :param params: by default self.params otherwise use values as input :return: pdf value at x or an np.array([…]) of pdf values

plot(**kwargs)¶

plot(self, which=0, params=None, bounds=None,: ticks=1000, showdata=True, bins=20, title=None, dcolor=’w’, seaborn=True, lw=2, pi=1., colors=None, color_order=None, show_legend=True):

Parameters:	which – params – bounds – ticks – showdata – bins – title – dcolor – seaborn – lw – pi – colors – color_order – show_legend –
Returns:

plotboth(**kwargs)¶: plot marginal distribution side by side with :param args: :param kwargs: :return:

plotter = None¶

controls plotting for QRSEModel

see pyqrse.utilitities.plottools.QRSEPlotter

pnames¶: parameter names

pnames_latex¶: latex formatted parameter names

potential(x)¶

potential function of the kernel

Parameters:	x (float or np.array([float]) – value of data being tested
Returns:	float or np.array([float])

res¶

rvs(n=1, bounds=None)¶

random variable sampler using the interpolated inverse cdf method

rvs works as follows:

Creates a grid approximation of pdf based on bounds.

Estimates the cdf using this grid.

Interpolates the inverse cdf using sp.interpolate

Samples from uniform(0,1) distribution

Enters uniform samples into inverse cdf function

Parameters:

n (int) – number of samples. must be either a positive integer. default is 1, which return a single sample. If n > 1, returns an n length np.array of samples
bounds (list, tuple or None) – [-10, 10, 10000] / (-10, 10, 10000) create 10000 ticks between -10 and 10 as an estimate of the pdf of the Model. If None (default), will will use predetermined ticks based on bounds of integration bounds is preset to None and generally won’t need to be adjusted

Returns:

float or np.array([float])

sampler = None¶: controls mcmc sampling for QRSEModel see pyqrse.fittools.sampling.QRSESampler

set_hess_inv(from_res=False)¶

Set model inverse Hessian

primarily used to find the inverse Hessian for Sampling

Parameters:	from_res (bool) – If False, (default) will find use Autograd to the Hessian. If True, will use the estimated value from the .fit() optimization routine

set_log_prior(new_log_prior)¶

sets new log prior function so that it can access ‘self’

Alternative setter for pyqrse.model.model.QRSEModel.log_prior(). If accessing ‘self’ isn’t necessary, follow the instructions for that method.

Parameters:

new_log_prior (function) – new prior function must of the form:

def new_log_prior(self, params):
    # complicated mathematics that
    # use params in addition to
    # self.attributes and/or self.methods
    return float_value_of_log_prior

params must be an np.array of the appropriate length

set_params(new_params, use_sp=True)¶

updates params and allows choice if self.partition uses sp or ticks

An alternative to using the self.params.setter that allows for choice of integration method

Parameters:	new_params (tuple, list, or np.ndarray) – new parameter values. Must be same length as params and should not include xi use_sp (bool) – If True (default), solve partition function using scipy.integrate.quad. If False, will use ticks. When set to False, is equivalent to using self.params = new_params

setup_from_params(parameters, start=2, imax=100, minmax=(2e-07, 4.5e-05), find_mode=True, stds=None)¶

Find bounds of integration for a given parameterization

Will attempt to set model wide variables appropriate to model given parameters. does a binary search over the kernel values to find the points whose value is between the minmax bounds.

This function will not guarantee results when the starting point is not the mode of the kernel or if the functions is not monotonically decreasing away from the mode.

This function is only necessary when working without data since bounds of integration can be inferred from the data.

Parameters:

parameters (tuple, list, or np.array) – parameter values to initialize the model
start (int) - uses that index from params | (float) – that value | (else) - 0.
imax (int) – maximum number of steps before quitting search
minmax (tuple) – min, max values of kernel, by default searches for the range (2e-07, 4.5e-05)
find_mode (bool) – If True, searches for and begins from mode (default True)

std(use_sp=True)¶

Standard Deviation of the QRSE distribution

Use_sp:	if False (default) will find optimum over grid of ticks if True will use scipy integrate/maximize
Returns:	estimated standard deviation of the distribution

update_p0(data, weights=None, i_std=7)¶

use_xi¶: does the model use xi?

xi¶: mean of the data

pyqrse.model.model.available_kernels()¶

pyqrse.model package¶

Submodules¶

pyqrse.model.model module¶

Module contents¶