pyqrse.model package¶
Submodules¶
pyqrse.model.model module¶
-
class
pyqrse.model.model.QRSEModel(kernel='S', data=None, params=None, i_ticks=1000, i_stds=10, i_bounds=(-10, 10), about_data='', norm_data=False, **kwargs)¶ Bases:
pyqrse.utilities.mixins.HistoryMixin,pyqrse.utilities.mixins.PickleMixinThe primary model container for working with QRSE Models
QRSEModel can be instantiated as either QRSEModel() or as QRSE(). When working in a jupyter notebook, the shorter version is generally preferrable. However, when scripting QRSEModel is preferred.
When instantiated the QRSEModel attempts to find the appropriate bounds of integration, self.i_bounds, by trying the following three methods in order:
- Use the sufficient statistics of the data.
2. Identify the meaningful support of the kernel using the parameter values provided.
3. If there is no data or parameter provided the model will use defaults from the kernel
Parameters: kernel (str, object) – can either be a kernel code or QRSE kernel class object, which includes any
QRSE kernel. The default kernel is the SQRSEKernel. Available kernels can be seen by running:>>>pyqrse.available_kernels()data (np.array or str or None) – If 1d np.array, will set self.data to that array. If str, will load data as ‘path/to/data.cvs. If None, will use params to instantiate the model. When given a string loads data using the pandas.read_csv module. Depending on the format of the data, it may be necessary to use pandas.read_csv keywords.
params (np.array or None) – must be an np.array of the appropriate length.
i_ticks (int) – the number of ticks in grid of integration values. default is 1000.
i_stds (int) – the number of data standard deviations to i_bounds to. default is 10.
about_data (str) – saves notes about the data to the self.notes dictionary.
norm_data (book) – if True, will normalize data. If data is normalized data sufficient statistics can be accessed at self.data_suff_stats
kwargs – optional keyword arguments for pandas.read_csv and
~
- If the model is instantiated without data, data should only be added with
- using the
add_datamethod
- If the model is instantiated without data or params, model parameter
values should be added using the
self.setup_from_paramsmethod.-
action_entropy(x)¶ Entropy of conditional action distribution
H(p(a|x)) = SUM p(a_i|x) for i=1,2 (binary) (i=1,2,3 for ternary)
action entropy is evaluated using self.params
Parameters: x (float or np.array([float]) – value of data being tested Returns: float or np.array([float])
-
actions¶ action set
-
add_data(data, index_col=0, header=None, squeeze=True, silent=False, save_abs_path=False, norm_data=False, **kwargs)¶ Primary means of adding data to model. It will set integration defaults according to the shape of the data.
Parameters: - data – either pandas.Series, np.ndarray, or “path/to/data”
- index_col – pandas.read_csv keyword argument
- header – pandas.read_csv keyword argument
- squeeze – pandas.read_csv keyword argument
- silent – no printing while running. default is False
- save_abs_path – if True will save absolute instead of relative path to the data useful if saving object to different location
- norm_data – True or False to normalize data
- kwargs – keyword arguments for pandas.read_csv
Returns:
-
aic(count_xi=True)¶ Akaike information criterion
Parameters: count_xi (bool) – default is True. Count xi in the parameter count if kernel uses xi Returns: aic of model given data and parameter values
-
aicc(count_xi=True)¶ Returns: aicc of model given data
-
bic(count_xi=True)¶ Returns: aicc of model given data
-
code¶ QRSEModel Identification code for the Kernel
-
entropy(etype='joint')¶ Entropy of the QRSEModel
Will find the joint H(x, a), conditional H(x|a), or marginal entropy H(x). Note that conditional entropy H(x|a) is different from the entropy of the conditional distribution at some value x, H(p(a|x)).
To find H(p(a|x)) use the .action_entropy method:
Parameters: etype (str or list(str)) – type of entropy returned. The options are ‘joint’, ‘cond’, or ‘marg’. If entered as as a list, will return an array of entropy values Returns: float or np.array of floats
-
evidence(data=None)¶ applies pdf to self.data :param data: :return:
-
fit(**kwargs)¶ Parameters: - data –
- params0 –
- summary –
- save –
- use_jac –
- weights –
- hist –
- check –
- silent –
- use_hess –
- smart_p0 –
- use_sp –
- kwargs – see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
Returns:
-
fitter= None¶ controls model fitting for QRSEModel
allows fitting via Kullbeck-Leibler Distance Minimization and maximum likelihood estimation see pyqrse.fittools.optimizer.QRSEFitter
-
fn_params¶ Number of parameters in the model including xi
if the model uses xi, add one to n_params
Returns: full number of of parameters
-
fparams¶ full parameter values
Appends xi to params if the kernel uses xi
Returns: np.array(float) - params.append(xi)
-
fpnames¶ Full list of parameter names including xi
Appends xi to pnames if the kernel uses xi
Returns: np.array(float) - pnames.append(xi)
-
fpnames_latex¶ full latex formatted parameter names
Appends xi to pnames_latex if the kernel uses xi
Returns: pnames_latex.append(xi)
-
hess_fun(params)¶ Value of the Hessian of the negative log likelihood
Parameters: params (np.array(float)) – model parameters Returns: np.array(float)(2d) value of Hessian at params
-
hess_inv_fun(params=None)¶ Inverse Hessian of the negative log likelihood
Parameters: params (np.array(float)) – model parameters Returns: np.array(float)(2d) value of Inverse Hessian at params
-
i_bounds¶ (min, max) of bounds of integration
-
indifference(actions=(0, 1))¶ Returns: x s.t. p(buy | x) = p(sell | x)
-
jac_fun(params=None)¶ Value of the jacobian of the negative log likelihood
Parameters: params (np.array(float)) – model parameters Returns: np.array(float) value of jacobian at params
-
kernel_fun(x)¶ value unnormalized kernel function
kernel = exp(potential + entropy)
evaluated at self.params
Parameters: x (float or np.array([float]) – value of data being tested Returns: float or np.array([float])
-
log_kernel(x)¶ Log of the unnormalized kernel function
log_kernel = potential + entropy
evaluated at self.params
Parameters: x (float or np.array([float]) – value of data being tested Returns: float or np.array([float])
-
log_partition(params=None)¶ evaluate the log of the QRSE partition function numerically
Parameters: params (np.array) – if params are None, will use self.params, otherwise will evaluate at params. Returns: the value of the log partition function (float)
-
log_prior(params)¶ The log of the prior distribution of parameter values
Used in for fitting the model to data. By default returns 0., which is equivalent having no prior.
Can be overridden for an individual QRSEModel instance as follows:
Instantiate the instance of the QRSEModel:
qrse1 = QRSEModel('AT', data=data) # or qrse1 = QRSE('SF', data=data)
Define a new function for the prior:
def new_log_prior(params): # self is not included like in normal methods # params will be a 1d np.array the same length as n_params # prior hyper parameters should be hardcoded into function hyper_parameters = [hyper_parameter_0, hyper_parameter_1, hyper_parameter_2] # output of prior function should be negative to # penalize likelihood function negative_squared_loss = -(params - hyper_parameters)**2 return np.sum(negative_squared_loss)
Redefine the instance method to be the new function:
qrse1.log_prior = new_log_prior
It is generally advised to change log_prior at the instance level. Changing it at the class level i.e:
QRSEModel.log_prior = new_log_prior
or
QRSE.log_prior = new_log_prior
will NOT work as intended.
Also, see
pyqrse.model.model.QRSEModel.set_log_prior()Parameters: params (np.array) – parameter values to evaluate Returns: float
-
log_prob(*args, **kwargs)¶ log probability = -1 * ( negative log likelihood )
- nll(self, data=None, params=None, weights=None, use_sp=False)
Parameters: - data –
- params –
- weights –
- use_sp –
Returns:
-
logits(x, params=None)¶ Parameters: - x –
- params –
Returns:
-
long_name= None¶ longer name of the model by default name is set of the full name of kernel name (i.e. Symmetric-QRSE)
-
mean(use_sp=True)¶ Mean of the QRSE distribution
Use_sp: if False (default) will find optimum over grid of ticks if True will use scipy integrate/maximize Returns: estimated mean of the distribution
-
mode(use_sp=True)¶ Mode of the QRSE distribution
Use_sp: if False will find optimum over grid of ticks if True (default) will use scipy integrate/maximize Returns: estimated mode of the distribution
-
n_actions¶
-
n_params¶ number of free model parameters. does not include xi.
-
name= None¶ name of the model by default name is set of the abbreviated kernel name (i.e. S-QRSE)
-
nll(data=None, params=None, weights=None, use_sp=False)¶ value of the negative log likelihood function
Parameters: - data –
- params –
- weights –
- use_sp –
Returns:
-
notes= None¶ A ‘notes’ dictionary for the model Conveniently track things to remember about results. Especially useful pickling the object
-
params¶ np.array of parameter values
-
partition(params=None, use_sp=False)¶ evaluate the QRSE partition function numerically
Parameters: - params (np.array) – if params are None, will use self.params, otherwise will evaluate at params.
- use_sp (bool) – If True, will evaluate using scipy.integrate.quad over the range self.i_bounds. If False (default), will evalaute over a grid values by summing the value of the log_kernel over the grid adjusting for step size and taking exp(log_part). The grid method (False) tends to be quicker and generally the loss of precision numerical integration in negligible.
Returns: the value of the partition function (float)
-
pdf(x, params=None)¶ Probability Density Function :param x: input value or np.array of values :param params: by default self.params otherwise use values as input :return: pdf value at x or an np.array([…]) of pdf values
-
plot(**kwargs)¶ - plot(self, which=0, params=None, bounds=None,
- ticks=1000, showdata=True, bins=20, title=None, dcolor=’w’, seaborn=True, lw=2, pi=1., colors=None, color_order=None, show_legend=True):
Parameters: - which –
- params –
- bounds –
- ticks –
- showdata –
- bins –
- title –
- dcolor –
- seaborn –
- lw –
- pi –
- colors –
- color_order –
- show_legend –
Returns:
-
plotboth(**kwargs)¶ plot marginal distribution side by side with :param args: :param kwargs: :return:
-
plotter= None¶ controls plotting for QRSEModel
see pyqrse.utilitities.plottools.QRSEPlotter
-
pnames¶ parameter names
-
pnames_latex¶ latex formatted parameter names
-
potential(x)¶ potential function of the kernel
Parameters: x (float or np.array([float]) – value of data being tested Returns: float or np.array([float])
-
res¶
-
rvs(n=1, bounds=None)¶ random variable sampler using the interpolated inverse cdf method
rvs works as follows:
- Creates a grid approximation of pdf based on bounds.
- Estimates the cdf using this grid.
- Interpolates the inverse cdf using sp.interpolate
- Samples from uniform(0,1) distribution
- Enters uniform samples into inverse cdf function
Parameters: - n (int) – number of samples. must be either a positive integer. default is 1, which return a single sample. If n > 1, returns an n length np.array of samples
- bounds (list, tuple or None) – [-10, 10, 10000] / (-10, 10, 10000) create 10000 ticks between -10 and 10 as an estimate of the pdf of the Model. If None (default), will will use predetermined ticks based on bounds of integration bounds is preset to None and generally won’t need to be adjusted
Returns: float or np.array([float])
-
sampler= None¶ controls mcmc sampling for QRSEModel see pyqrse.fittools.sampling.QRSESampler
-
set_hess_inv(from_res=False)¶ Set model inverse Hessian
primarily used to find the inverse Hessian for Sampling
Parameters: from_res (bool) – If False, (default) will find use Autograd to the Hessian. If True, will use the estimated value from the .fit() optimization routine
-
set_log_prior(new_log_prior)¶ sets new log prior function so that it can access ‘self’
Alternative setter for
pyqrse.model.model.QRSEModel.log_prior(). If accessing ‘self’ isn’t necessary, follow the instructions for that method.Parameters: new_log_prior (function) – new prior function must of the form:
def new_log_prior(self, params): # complicated mathematics that # use params in addition to # self.attributes and/or self.methods return float_value_of_log_prior
- params must be an np.array of the appropriate length
-
set_params(new_params, use_sp=True)¶ updates params and allows choice if self.partition uses sp or ticks
An alternative to using the self.params.setter that allows for choice of integration method
Parameters: - new_params (tuple, list, or np.ndarray) – new parameter values. Must be same length as params and should not include xi
- use_sp (bool) – If True (default), solve partition function using scipy.integrate.quad. If False, will use ticks. When set to False, is equivalent to using self.params = new_params
-
setup_from_params(parameters, start=2, imax=100, minmax=(2e-07, 4.5e-05), find_mode=True, stds=None)¶ Find bounds of integration for a given parameterization
Will attempt to set model wide variables appropriate to model given parameters. does a binary search over the kernel values to find the points whose value is between the minmax bounds.
This function will not guarantee results when the starting point is not the mode of the kernel or if the functions is not monotonically decreasing away from the mode.
This function is only necessary when working without data since bounds of integration can be inferred from the data.
Parameters: - parameters (tuple, list, or np.array) – parameter values to initialize the model
- start (int) - uses that index from params | (float) – that value | (else) - 0.
- imax (int) – maximum number of steps before quitting search
- minmax (tuple) – min, max values of kernel, by default searches for the range (2e-07, 4.5e-05)
- find_mode (bool) – If True, searches for and begins from mode (default True)
-
std(use_sp=True)¶ Standard Deviation of the QRSE distribution
Use_sp: if False (default) will find optimum over grid of ticks if True will use scipy integrate/maximize Returns: estimated standard deviation of the distribution
-
update_p0(data, weights=None, i_std=7)¶
-
use_xi¶ does the model use xi?
-
xi¶ mean of the data
-
pyqrse.model.model.available_kernels()¶