The pyqrse API reference

QRSEModel

class pyqrse.model.model.QRSEModel(kernel='S', data=None, params=None, i_ticks=1000, i_stds=10, i_bounds=(-10, 10), about_data='', norm_data=False, **kwargs)

The primary model container for working with QRSE Models

QRSEModel can be instantiated as either QRSEModel() or as QRSE(). When working in a jupyter notebook, the shorter version is generally preferrable. However, when scripting QRSEModel is preferred.

When instantiated the QRSEModel attempts to find the appropriate bounds of integration, self.i_bounds, by trying the following three methods in order:

  1. Use the sufficient statistics of the data.

2. Identify the meaningful support of the kernel using the parameter values provided.

3. If there is no data or parameter provided the model will use defaults from the kernel

Parameters:
  • kernel (str, object) – can either be a kernel code or QRSE kernel class object, which includes any QRSE kernel. The default kernel is the SQRSEKernel. Available kernels can be seen by running:

    >>>pyqrse.available_kernels()
    
  • data (np.array or str or None) – If 1d np.array, will set self.data to that array. If str, will load data as ‘path/to/data.cvs. If None, will use params to instantiate the model. When given a string loads data using the pandas.read_csv module. Depending on the format of the data, it may be necessary to use pandas.read_csv keywords.

  • params (np.array or None) – must be an np.array of the appropriate length.

  • i_ticks (int) – the number of ticks in grid of integration values. default is 1000.

  • i_stds (int) – the number of data standard deviations to i_bounds to. default is 10.

  • about_data (str) – saves notes about the data to the self.notes dictionary.

  • norm_data (book) – if True, will normalize data. If data is normalized data sufficient statistics can be accessed at self.data_suff_stats

  • kwargs – optional keyword arguments for pandas.read_csv and ~

  • If the model is instantiated without data, data should only be added with
    using the add_data method
  • If the model is instantiated without data or params, model parameter

values should be added using the self.setup_from_params method.

action_entropy(x)

Entropy of conditional action distribution

H(p(a|x)) = SUM p(a_i|x) for i=1,2 (binary) (i=1,2,3 for ternary)

action entropy is evaluated using self.params

Parameters:x (float or np.array([float]) – value of data being tested
Returns:float or np.array([float])
actions

action set

add_data(data, index_col=0, header=None, squeeze=True, silent=False, save_abs_path=False, norm_data=False, **kwargs)

Primary means of adding data to model. It will set integration defaults according to the shape of the data.

Parameters:
  • data – either pandas.Series, np.ndarray, or “path/to/data”
  • index_col – pandas.read_csv keyword argument
  • header – pandas.read_csv keyword argument
  • squeeze – pandas.read_csv keyword argument
  • silent – no printing while running. default is False
  • save_abs_path – if True will save absolute instead of relative path to the data useful if saving object to different location
  • norm_data – True or False to normalize data
  • kwargs – keyword arguments for pandas.read_csv
Returns:

aic(count_xi=True)

Akaike information criterion

Parameters:count_xi (bool) – default is True. Count xi in the parameter count if kernel uses xi
Returns:aic of model given data and parameter values
aicc(count_xi=True)
Returns:aicc of model given data
bic(count_xi=True)
Returns:aicc of model given data
code

QRSEModel Identification code for the Kernel

entropy(etype='joint')

Entropy of the QRSEModel

Will find the joint H(x, a), conditional H(x|a), or marginal entropy H(x). Note that conditional entropy H(x|a) is different from the entropy of the conditional distribution at some value x, H(p(a|x)).

To find H(p(a|x)) use the .action_entropy method:

Parameters:etype (str or list(str)) – type of entropy returned. The options are ‘joint’, ‘cond’, or ‘marg’. If entered as as a list, will return an array of entropy values
Returns:float or np.array of floats
evidence(data=None)

applies pdf to self.data :param data: :return:

fit(**kwargs)
Parameters:
Returns:

fitter = None

controls model fitting for QRSEModel

allows fitting via Kullbeck-Leibler Distance Minimization and maximum likelihood estimation see pyqrse.fittools.optimizer.QRSEFitter

fn_params

Number of parameters in the model including xi

if the model uses xi, add one to n_params

Returns:full number of of parameters
fparams

full parameter values

Appends xi to params if the kernel uses xi

Returns:np.array(float) - params.append(xi)
fpnames

Full list of parameter names including xi

Appends xi to pnames if the kernel uses xi

Returns:np.array(float) - pnames.append(xi)
fpnames_latex

full latex formatted parameter names

Appends xi to pnames_latex if the kernel uses xi

Returns:pnames_latex.append(xi)
classmethod from_pickle(path_to_pickle, trust_check=False, **kwargs)

!!!DO NOT RUN THIS FUNCTION UNLESS YOU TRUST THE SOURCE WITH ABSOLUTE CERTAINTY!!!

Pickling is extremely convenient from a workflow perspective, as you can save the results of inquiries and instantly load them back into your python environment.

However, there are no safety checks on the code that will be run. That means:

-If you don’t trust the source, don’t unpickle it.

-Python will run all code in the pickle malicious or not!

Parameters:
  • path_to_pickle – individual or list of paths to saved pickled QRSE objects
  • args
  • kwargs
  • trust_check – prompts the user to verify that they trust the source of the file to be unpickled default value is True
Returns:

hess_fun(params)

Value of the Hessian of the negative log likelihood

Parameters:params (np.array(float)) – model parameters
Returns:np.array(float)(2d) value of Hessian at params
hess_inv_fun(params=None)

Inverse Hessian of the negative log likelihood

Parameters:params (np.array(float)) – model parameters
Returns:np.array(float)(2d) value of Inverse Hessian at params
history()
i_bounds

(min, max) of bounds of integration

indifference(actions=(0, 1))
Returns:x s.t. p(buy | x) = p(sell | x)
jac_fun(params=None)

Value of the jacobian of the negative log likelihood

Parameters:params (np.array(float)) – model parameters
Returns:np.array(float) value of jacobian at params
kernel_fun(x)

value unnormalized kernel function

kernel = exp(potential + entropy)

evaluated at self.params

Parameters:x (float or np.array([float]) – value of data being tested
Returns:float or np.array([float])
log_kernel(x)

Log of the unnormalized kernel function

log_kernel = potential + entropy

evaluated at self.params

Parameters:x (float or np.array([float]) – value of data being tested
Returns:float or np.array([float])
log_partition(params=None)

evaluate the log of the QRSE partition function numerically

Parameters:params (np.array) – if params are None, will use self.params, otherwise will evaluate at params.
Returns:the value of the log partition function (float)
log_prior(params)

The log of the prior distribution of parameter values

Used in for fitting the model to data. By default returns 0., which is equivalent having no prior.

Can be overridden for an individual QRSEModel instance as follows:

  1. Instantiate the instance of the QRSEModel:

    qrse1 = QRSEModel('AT', data=data)
    
    # or
    
    qrse1 = QRSE('SF', data=data)
    
  2. Define a new function for the prior:

    def new_log_prior(params):
    
        # self is not included like in normal methods
        # params will be a 1d np.array the same length as n_params
    
    
        # prior hyper parameters should be hardcoded into function
    
        hyper_parameters = [hyper_parameter_0,
                            hyper_parameter_1,
                            hyper_parameter_2]
    
        # output of prior function should be negative to
        # penalize likelihood function
    
        negative_squared_loss = -(params - hyper_parameters)**2
        return np.sum(negative_squared_loss)
    
  3. Redefine the instance method to be the new function:

    qrse1.log_prior = new_log_prior
    

It is generally advised to change log_prior at the instance level. Changing it at the class level i.e:

QRSEModel.log_prior = new_log_prior

or

QRSE.log_prior = new_log_prior

will NOT work as intended.

Also, see pyqrse.model.model.QRSEModel.set_log_prior()

Parameters:params (np.array) – parameter values to evaluate
Returns:float
log_prob(*args, **kwargs)

log probability = -1 * ( negative log likelihood )

  • nll(self, data=None, params=None, weights=None, use_sp=False)
Parameters:
  • data
  • params
  • weights
  • use_sp
Returns:

logits(x, params=None)
Parameters:
  • x
  • params
Returns:

long_name = None

longer name of the model by default name is set of the full name of kernel name (i.e. Symmetric-QRSE)

mean(use_sp=True)

Mean of the QRSE distribution

Use_sp:if False (default) will find optimum over grid of ticks if True will use scipy integrate/maximize
Returns:estimated mean of the distribution
mode(use_sp=True)

Mode of the QRSE distribution

Use_sp:if False will find optimum over grid of ticks if True (default) will use scipy integrate/maximize
Returns:estimated mode of the distribution
n_actions
n_params

number of free model parameters. does not include xi.

name = None

name of the model by default name is set of the abbreviated kernel name (i.e. S-QRSE)

nll(data=None, params=None, weights=None, use_sp=False)

value of the negative log likelihood function

Parameters:
  • data
  • params
  • weights
  • use_sp
Returns:

notes = None

A ‘notes’ dictionary for the model Conveniently track things to remember about results. Especially useful pickling the object

params

np.array of parameter values

partition(params=None, use_sp=False)

evaluate the QRSE partition function numerically

Parameters:
  • params (np.array) – if params are None, will use self.params, otherwise will evaluate at params.
  • use_sp (bool) – If True, will evaluate using scipy.integrate.quad over the range self.i_bounds. If False (default), will evalaute over a grid values by summing the value of the log_kernel over the grid adjusting for step size and taking exp(log_part). The grid method (False) tends to be quicker and generally the loss of precision numerical integration in negligible.
Returns:

the value of the partition function (float)

pdf(x, params=None)

Probability Density Function :param x: input value or np.array of values :param params: by default self.params otherwise use values as input :return: pdf value at x or an np.array([…]) of pdf values

plot(**kwargs)
plot(self, which=0, params=None, bounds=None,
ticks=1000, showdata=True, bins=20, title=None, dcolor=’w’, seaborn=True, lw=2, pi=1., colors=None, color_order=None, show_legend=True):
Parameters:
  • which
  • params
  • bounds
  • ticks
  • showdata
  • bins
  • title
  • dcolor
  • seaborn
  • lw
  • pi
  • colors
  • color_order
  • show_legend
Returns:

plotboth(**kwargs)

plot marginal distribution side by side with :param args: :param kwargs: :return:

plotter = None

controls plotting for QRSEModel

see pyqrse.utilitities.plottools.QRSEPlotter

pnames

parameter names

pnames_latex

latex formatted parameter names

potential(x)

potential function of the kernel

Parameters:x (float or np.array([float]) – value of data being tested
Returns:float or np.array([float])
res
reset_history()
rvs(n=1, bounds=None)

random variable sampler using the interpolated inverse cdf method

rvs works as follows:

  1. Creates a grid approximation of pdf based on bounds.
  2. Estimates the cdf using this grid.
  3. Interpolates the inverse cdf using sp.interpolate
  4. Samples from uniform(0,1) distribution
  5. Enters uniform samples into inverse cdf function
Parameters:
  • n (int) – number of samples. must be either a positive integer. default is 1, which return a single sample. If n > 1, returns an n length np.array of samples
  • bounds (list, tuple or None) – [-10, 10, 10000] / (-10, 10, 10000) create 10000 ticks between -10 and 10 as an estimate of the pdf of the Model. If None (default), will will use predetermined ticks based on bounds of integration bounds is preset to None and generally won’t need to be adjusted
Returns:

float or np.array([float])

sampler = None

controls mcmc sampling for QRSEModel see pyqrse.fittools.sampling.QRSESampler

save_history(new_hist=None)
set_hess_inv(from_res=False)

Set model inverse Hessian

primarily used to find the inverse Hessian for Sampling

Parameters:from_res (bool) – If False, (default) will find use Autograd to the Hessian. If True, will use the estimated value from the .fit() optimization routine
set_log_prior(new_log_prior)

sets new log prior function so that it can access ‘self’

Alternative setter for pyqrse.model.model.QRSEModel.log_prior(). If accessing ‘self’ isn’t necessary, follow the instructions for that method.

Parameters:

new_log_prior (function) – new prior function must of the form:

def new_log_prior(self, params):
    # complicated mathematics that
    # use params in addition to
    # self.attributes and/or self.methods
    return float_value_of_log_prior
  • params must be an np.array of the appropriate length
set_params(new_params, use_sp=True)

updates params and allows choice if self.partition uses sp or ticks

An alternative to using the self.params.setter that allows for choice of integration method

Parameters:
  • new_params (tuple, list, or np.ndarray) – new parameter values. Must be same length as params and should not include xi
  • use_sp (bool) – If True (default), solve partition function using scipy.integrate.quad. If False, will use ticks. When set to False, is equivalent to using self.params = new_params
setup_from_params(parameters, start=2, imax=100, minmax=(2e-07, 4.5e-05), find_mode=True, stds=None)

Find bounds of integration for a given parameterization

Will attempt to set model wide variables appropriate to model given parameters. does a binary search over the kernel values to find the points whose value is between the minmax bounds.

This function will not guarantee results when the starting point is not the mode of the kernel or if the functions is not monotonically decreasing away from the mode.

This function is only necessary when working without data since bounds of integration can be inferred from the data.

Parameters:
  • parameters (tuple, list, or np.array) – parameter values to initialize the model
  • start (int) - uses that index from params | (float) – that value | (else) - 0.
  • imax (int) – maximum number of steps before quitting search
  • minmax (tuple) – min, max values of kernel, by default searches for the range (2e-07, 4.5e-05)
  • find_mode (bool) – If True, searches for and begins from mode (default True)
std(use_sp=True)

Standard Deviation of the QRSE distribution

Use_sp:if False (default) will find optimum over grid of ticks if True will use scipy integrate/maximize
Returns:estimated standard deviation of the distribution
to_pickle(path_to_pickle)

Uses the python pickle module to serialize the object (pickle it). Pickling allows it to be saved and reloaded later.

Parameters:path_to_pickle
Returns:
update_p0(data, weights=None, i_std=7)
use_xi

does the model use xi?

xi

mean of the data

QRSESampler

class pyqrse.fittools.sampling.QRSESampler(qrse_model, chain_format='df')

sampler doc_string

a_rates

acceptance rates for the sampler

chain
getdiff(parameter1, parameter2)

Get the difference between the chains of two parameters

Parameters:
  • parameter1 – string name for p1 (i.e. ‘t_buy’)
  • parameter2 – string name for p2 (i.e. ‘t_sell’)
Returns:

np.ndarray

init(*args, **kwargs)

updates sampler with recent activity of the QRSEModel() :return:

marg_like
max_like()
max_params
mcmc(N=1000, burn=0, single=False, ptype='corr', s=1.0, update_hess=False, new=False, use_tqdm=True)
Parameters:
  • N
  • burn
  • single
  • ptype
  • s
  • update_hess
  • new
Returns:

n_errors
next(sample_fun='joint', **kwargs)
plot(per_row=2, figsize=(12, 4), use_latex=True)

plot(self, per_row=2, figsize=(12, 4)): :param per_row: :param figsize: :return:

plotdiff(parameter1, parameter2, kind='hist', use_latex=True, figsize=None, **kwargs)

Quickly view the difference between the chains of two parameters

Parameters:
  • parameter1 – string name for p1 (i.e. ‘t_buy’)
  • parameter2 – string name for p2 (i.e. ‘t_sell’)
  • kind – ‘hist’ for histogram or ‘line’ for time-series
  • use_latex – use latex version of parameter names. default is True
  • figsize – invokes plt.figure(figsize=figsize).
  • kwargs – additional arguments for sns.distplot() and plt.plot()
Returns:

propose_new(params=None, ptype='corr', s=1.0)
set_params()

QRSEFitter

class pyqrse.fittools.optimizer.QRSEFitter(the_model)
fit(data=None, params0=None, summary=False, save=True, use_jac=True, weights=None, hist=False, check=False, silent=True, use_hess=False, smart_p0=True, use_sp=True, **kwargs)
Parameters:
Returns:

history()
kld(params=None, target=None)
Parameters:
  • params
  • target
Returns:

klmin(target=None, save=True, use_jac=True, **kwargs)
Parameters:
  • target
  • save
  • use_jac
  • kwargs
Returns:

reset_history()
save_history(new_hist=None)
set_kl_target(target)
Parameters:target
Returns:
update_model()

QRSEPlotter

class pyqrse.utilities.plottools.QRSEPlotter(qrse_object, colors=None, color_order=None)

THIS IS SWEET AT MAKING CHARTS

colors
plot(which=0, params=None, bounds=None, ticks=1000, showdata=True, bins=20, title=None, dcolor='w', seaborn=True, lw=2, pi=1.0, colors=None, color_order=None, show_legend=True)
plot(self, which=0, params=None, bounds=None,
ticks=1000, showdata=True, bins=20, title=None, dcolor=’w’, seaborn=True, lw=2, pi=1., colors=None, color_order=None, show_legend=True):
Parameters:
  • which
  • params
  • bounds
  • ticks
  • showdata
  • bins
  • title
  • dcolor
  • seaborn
  • lw
  • pi
  • colors
  • color_order
  • show_legend
Returns:

plotboth(*args, **kwargs)

plot marginal distribution side by side with :param args: :param kwargs: :return:

set_color_order(color_order=None, output=False)
set_colors(colors=None, output=False)