karoo_gp/RELEASE_NOTES.txt

442 lines
21 KiB
Plaintext

2017 07/03
I am pleased to announce that Karoo GP is now updated to include a full suite of mathematical operators. I thank the
expert code development of Iurii Milovanov. He was instrumental in bringing TensorFlow into Karoo last year, and has
now provided this important improvement.
Iurii has prpared an efficient and elegant solution for the addition of a full range of operators, adding support for
boolean operations (a and b or c), comparison ops (a > b <= c == d) and generally speaking any function available in
TensorFlow, as given here: https://www.tensorflow.org/api_guides/python/math_ops
Now there is a mapping in Karoo GP that connects an expression to the TF function:
OPERATOR EXAMPLE
add a + b
subtract a - b
multiply a * b
divide a / b
pow a ** 2
negative -a
logical_and a and b
logical_or a or b
logical_not not a
equal a == b
not_equal a != b
less a < b
less_equal a <= b
greater a > b
greater_equal a >= 1
abs abs(a)
sign sign(a)
square square(a)
sqrt sqrt(a)
pow pow(a, b)
log log(a)
log1p log1p(a)
cos cos(a)
sin sin(a)
tan tan(a)
acos acos(a)
asin asin(a)
atan atan(a)
Now when Karoo parses an expression like "asin(x + 10)" it first parses and transforms the args "x + 10" and passes
the output to “tf.asin” function. So now Karoo is able to evaluate incredibly complx mathematical expressions.
It is important to refer to karoo_gp/files/templates/operators_list.txt to learn how to enage operators_[KERNEL].csv
In its current form Karoo requires some operators, such as abs, cos, and sin to have a unique formating:
+ sin,2
- sin,2
* sin,2
/ sin,2
In a future version of Karoo, this will be managed automatically by the recursive function which extracts the
expression from a GP Tree.
2017 06/06
A number of changes applied in March and April. My apologies for the delayed updates to github.
A timer is added. This is helpful when conducting comparisons of run times, given various configurations and datasets.
If you go looking for this code, search for 'time.time' in _main.py for the Desktop interface, and in _base_class.py
for the Server interface.
Raised tree_depth_max to 100 in _main in order to essentially remove the tree depth ceiling, allowing for the multi-
generational increase in tree depth. While this re-introduces bloat, it does provide for a greater degree of
experimental freedom. A reminder that the _server has always allowed for tree_depth_max to be set to any level.
Changed the balance of operators (reproduction, mutation, crossover) from a percentage to a fixed quantity in both the
default settings and in the user defined alteration.
The User Guide is updated with some minor adjustments to wording.
I have removed tools/ from Karoo package, as these will be re-launched as their own set of packages and associated
github repository. I am working to give each a proper interface and more broad functionality outside of the
Karoo package. Stay tuned!
Please also note that the variable os.environ (line 28 in _base_class.py) can be set to [0, 1, 2] to trigger 3 levels
of on-screen, TensorFlow run-time error logging. You can experiment with this to determine what is right for you.
The much desired trig operators will be returned to Karoo GP v1.0.4. Log and boolean operators shortly thereafter. I
offer my apology for the delay, as I know this has inhibited some expanded uses of Karoo in the past few months.
2017 02/13
Returned 'cwd = os.getcwd()' to 'cwd = os.path.dirname(full_path)' in fx_karoo_data_load in order to sustain chron
compliance (sorry Marco). Then split end-of-run logging into 2 files: runtime_config.txt and test.txt. Looking to the
future in which the test log, in particular, might be stored in such a way as to be readily parsed by an external
script for automated comparison of multiple runs.
Ideas are welcomed!
2017 02/09
The User Guide has been updated to match the functions of Karoo GP v1.0.1.
2017 02/08
The parameters.txt file now captures the full run-time configuration of both karoo_gp_main.py and karoo_gp_server.py.
This file is written to a unique (datetime stamp) directory in karoo_gp/runs/ at the auto-termination of _server.py
and with the user executed 'q'uit of _main.py.
The final list of best fit evolved Trees and the test of the highest numbered (usually the highest performing) Tree is
also recorded, with the test automatically executed based upon the original kernel designation. This functionality
supports the fully hands-off execution of Karoo GP on a remote server, from a bash or Python script, for parallel or
multiple serial evolutionary runs.
Thank you Marco Cavaglia for guiding the addition of this functionality to Karoo GP!
2017 02/06
Graphics Processing Units (GPU) are now supported with the introduction of the Python library TensorFlow. The end
result is a staggering improvement in performance. With one comparison of a 10,000 data points (rows) x 9 features
(columns) dataset on a 40 core Intel Xeon motherboard versus a 2000 core Nvidia GPU card, the wall time was reduced
from 50 hours to less than 4 minutes. On CPU-only computers, the performance on a single core is as much as 10x
improved due to the vectorisation of the data and application of the C-based TensorFlow maths library.
To install TensorFlow, I recommend visiting https://www.tensorflow.org/get_started/ It is straight forward for Ubuntu,
but unfortunately can be rather challenging with OSX. Have patience. Review the forums. It's worth the effort.
I owe many thanks to the expertise of Iurii Milovanov, a contract developer whom I engaged for this effort. While the
number of lines of Karoo GP modified were initially less than a dozen (replacing the multi-core pprocess calls), I asked
Iurii to also rewrite the test functions. As such, both Training and Testing are now fully GPU enabled. Thank you!
A number of other changes have been integrated, including:
- Karoo GP is now developed against Python 2.7 as provided with Ubuntu Desktop 16.04.1.
- A number of Python methods have been deleted, added, modified and/or renamed. In particular, in the category
'fx_fitness_' If you have built your own code based on the Karoo methods, please review this section carefully.
- The user engaged 'bal'ance function (pause menu) has been rebuilt to anticipate exact quantities instead of
percentages, enabling the user to define precisely how many of each of the four genetic operators will be applied
with the construction of each subsequent population.
- Activation of the 'test' is now conducted with only the letter 't' and the option to engage a specific number of
'c'ores is removed. Therefore, the 't'imer mode is also removed, as this was a means to discover the optimal number
for multi-core processing which is now automated by TensorFlow.
- The libraries 'pprocess' and 'time' are no longer required nor imported.
- The population_* files (.csv) are now written into unique directories created inside of karoo_gp/runs/ with the
launch of each run. A .txt file is also written to each directory which captures the run-time configuration of
Karoo GP. This enables truly scriptable runs of Karoo.
- The Server interface to Karoo GP (karoo_gp_server.py) now terminates completely, kicking back to the command line.
This enables bash or chron launches of multiple sequential or parallel runs, enabling the exploration of multiple
runs with identical configuration, or that of varied configuration parameters.
Finally, Karoo GP is now a 1.0 release. I never know when to transition from beta to real, so please forgive me if I
jumped the gun. But with GPU support and the revised Server script, I have find Karoo GP to be a fully functional,
powerful machine learning tool. I hope you will agree --kai
2016 09/20
Fixed the genetic operator (b)alance function to work with large than 100 trees per population.
Introduced the pause for all runtime modes in the Desktop application, such that the user can apply configurations prior
to the run (eg: change the balance of the genetic operators or the number of engaged cores).
2016 09/19b
After another 2 hours of trouble shooting, I learned that sympy.subs throws the 'zoo' error for a divide-by-zero if
working with integers--and stalls (as I witnessed last year when developing Karoo), but if throws 'inf' or '-inf' if
working with floats and continues to process unencumbered. This means the 'zoo' trap has not been used for over a year.
It is now removed from the method fx_eval_subs().
2016 09/19
All experiments with lambdify are on hold as it is throwing 'divide by zero' errors when no 0 exists in the data.
In the evaluation of this issue, I returned to the means by which I deal with 'divide by zero', independent of the
library used to process the trees. Originally, 'result' was replaced by the value 1. But a colleague suggested that I
might be tossing genetic material of value to future generations. So, now a tree which produces a 'divide by zero'
error ('zoo' in .subs) is retained, but given a fitness score of 0.
Method fx_eval_subs() is updated to assign an 'error' tag to any tree which exhibits 'divide by zero' behaviour.
Method fx_fitness_eval() is updated to bypass processing of a tree which has been tagged with an 'error' tag.
Methods fx_test_classify(), fx_test_regress(), and fx_test_match() are each updated to send the 'tree_id' to
fx_eval_subs() as is now required.
For now, it is not recommended that you use the lambdify function in fx_eval_subs(), unless you know how to fix it.
2016 09/16
With the 09/14 update I failed to upload the new coefficients.csv file to the files/ directory. While not yet engaged,
this will be the means by which the user can define the constants desired for the Karoo GP run. If you had run Karoo GP
v0.9.2.0 in the past 24 hours without this file, it would have complained. My apology.
Also, a bit of a road map for the 2nd half of 2016, into 2017
- validate the new (faster) sympy.lambdify and fully replace the current (slower) sympy.subs
- replace the row-by-row dictionaries with vectors for what should be a significant performance increase
- complete the introduction of constants in a manner more well defined than is currently supported
- investigate replacing pprocess with the multi-core library
- introduce Theano or Tensor Flow for GPU support
I welcome any assistance with these, if anyone has experience and time.
2016 09/14 - version 0.9.2.0
In karoo_gp_base_class.py
- Removed redundant lines in the method 'fx_karoo_data_load()'
- Added support for the Sympy 'lambdify' function in 'fx_karoo_data_load' (see explanation below)
- Added a draft means of catching divide-by-zero errors in the new 'lambdify' function
- Discovered the prior 'fx_eval_subs' uncorrectly applied a value of 1 to the variable 'result' as a means to
replace the 'zoo' function for divide by zero errors. However, this could inadvertently undermine the success of
Classification and Regression runs. My apology for not catching this sooner.
"While attending the CHEAPR 2016 workshop hosted by the Center for Cosmology and Astro-Particle Physics, The Ohio State
University, Michael Zevin of Northwestern University proposed that Karoo GP *should* be able to process trees far faster
than what we were seeing. I looked into the Sympy functions I was at that time using. Indeed, '.subs' is noted as easy
to use, but terribly slow as it relies upon an internal, Python mathematical library. I therefore replaced '.subs' with
'.lambdify' which calls upon the C-based Numpy maths library. It is slated to be 500x faster than '.subs', but I am
seeing only a 2x performance increase. Clearly, there are yet other barriers to remove.
In the new 'fx_eval_subs' method you will find both sympy.subs (active) and sympy.lambdify. While preliminary tests
worked well, I witnessed an erratic outcome which I yet need to reproduce and investigate. Feel free to comment the
.subs and uncomment the .lambdify sections and take it for a spin.
I believe there are 2 more steps to increase performance: removing the dictionaries which contain each row, such that
Karoo is working directly with the Numpy array again, and then processing the array as a vector instead. But this will
require substantial recoding.
I'll keep you informed ..." --kai
2016 08/08 - version 0.9.1.9
In karoo_gp_base_class.py
- Created a new method 'fx_eval_subs()' which conducts the SymPy subs function for both train and test data.
- Consolidated duplicate SymPy sub calls in both Train and Test methods, at Erik H. suggestion --thank you!
- Consolidated duplicate lines into single lines in both Train and Test methods.
- Fixed bug in which Ramped 50/50 trees would not print in Play mode; removed Ramped 50/50 option from Play mode as
it does not make sense for a single Tree
2016 07/24 - version 0.9.1.8b
Sorry. Switched all references to "LOGIC" back to "BOOLEAN". Don't ask.
2016 07/18 - version 0.9.1.8
In karoo_gp_base_class.py
- removed some programmer's comments from experiments resolved
- modified the screen output for the multi-class classifier such that it is more clear that p (a) is actually
predicted (actual) with the addition of the tag "label"
2016 07/15 - version 0.9.1.7
In karoo_gp_base_class.py
- Ramped Half/Half, as originally designed by John Koza is now implemented!
- this offers a greater diversity of trees in the initial population
- added *** Crossover *** to this method for debug display mode
2016 07/14 - version 0.9.1.6b
In karoo_gp_base_class.py
- fixed 'debug' display output for branch mutation
- fixed a few bugs in branch mutation that reduced evolutionary efficiency
- updated all ERROR messages to be uniform in output
- added (y/n) to "Are you certain you want to quit?" message --thanks Hunter!
In karoo_gp_main.py
- reset default evolutionary balance to .1/.1/.1/.7
2016 07/13 - version 0.9.1.6
In karoo_gp_base_class.py
- enabled auto counting of labels (right column) in loaded data (data_y)
- re-ordered presentation of screen modes, in order of decreasing feedback [i, g, m, s]
- added argparse to further enhance karoo_gp_server.py as a scriptable tool --FINALLY! :)
In karoo_gp_main.py
- removed user query for number of class labels
- re-ordered presentation of screen modes, in order of decreasing feedback [i, g, m, s]
In karoo_go_server.py
- re-ordered presentation of screen modes, in order of decreasing feedback [i, g, m, s]
- argparse to further enhance karoo_gp_server.py as a scriptable tool
2016 07/11 - version 0.9.1.5
In karoo_gp_base_class.py
- renamed kernel (fitness function) type from 'a' to 'r' for regression
- renamed all associated variable and methods to match switch from 'abs_value' to 'regression'
- reorganised validation methods to match order of fitness functions
- added [other] place holder to validation methods
In karoo_gp_main.py
- renamed kernel (fitness function) type from 'a' to 'r' for regression
- adjusted upper boundary for qty of generations from 1000 to 100
In karoo_go_server.py
- renamed kernel (fitness function) type from 'a' to 'r' for regression
In files/ data_ABS.csv and functions_ABS.csv were rename data_REGRESS.csv and functions_REGRESS.csv accordingly.
2016 07/10-11 - version 0.9.1.4
In karoo_gp_base_class.py
- renamed variable 'tree_depth_max' to 'tree_depth_base'
- renamed variable 'tree_depth_adj' to 'tree_depth_max'
- renamed variable 'gp.pop_tree_depth_max' to 'gp.pop_tree_depth_base'
- enabled 2 children to be produced by each Crossover function
- extensive testing of Crossover in debug to validate process
- reduced # of crossover functions per run by 1/2
- improved on-screen output for all 4 genetic operators
- removed not-in-use 'accuracy' test
- added 'Arguments required:' to each method notes
- edited a number of method notes
In karoo_gp_main.py
- renamed variables (according to karoo_gp_base_class.py)
In karoo_go_server.py
- renamed variables (according to karoo_gp_base_class.py)
2016 07/08-09 - version 0.9.1.3
In karoo_gp_base_class.py
- added CTRL-C catch to the (pause) menu; removes potential to accidentally kill a run when attempting to
copy/paste an on-screen function to research notes (use the mouse instead).
- rebuilt each (pause) menu function for improved exception handling
- added the new gp.tree_depth_adj user defined variable to branch mutation and crossover, enabling Trees to grow
beyond their original size which adds opportunity for more complex solutions, as well as the unavoidable bloat
- reduced complexity of a few lines in both branch mutation and crossover methods
- tested, tested, tested
In karoo_gp_main.py
- added CTRL-C catch to the at-launch user menu.
- renamed the method gp.fx_karoo_crossover_reproduction() to gp.fx_karoo_crossover()
- added user input for the new global variable gp.tree_depth_adj
In karoo_go_server.py
- added new gp.tree_depth_adj variable
2016 07/07 - version 0.9.1.2
In preparation for public launch of Karoo GP, a number of updates are complete or underway.
The Quick Start Tutorial is being fully revised. A number of corrections were made, but more importantly, all new
content has been added relevant to preparation of datasets and the use of the Karoo GP Tools accordingly. The genetic
operators descriptions now feature visuals and revised descriptions, as to many other sections.
In the karoo_gp/tools/ directory, all scripts have undergone updates, 2 of which now offer automated scaling and a
user interface that in the original versions were not present, as follows:
karoo_data_sort.py (formerly karoo_features_sort.py)
This script now engages the user with a query for the number of class labels and the number of data points (rows)
for the new, randomly generated subset of the parent dataset. This script is designed to be used prior to
karoo_normalise.py.
karoo_normalise.py
This script now auto-scales to any number of columns and rows (within the limit of your computer's capability),
and features a text-based user interface. This script is designed to be used following karoo_data_sort.py.
karoo_multi-classifier.py
This script functions as before, but with a minor bug fixed in which the final class was mislabeled.
karoo_iris_plot.py
This script functions as before, but with improved in-script documentation and cleaner code.
In development now are a number of updates and improvements to the base_class such that Karoo GP will more readily
conform to the GP standards, as follows:
1) Karoo GP currently produces only 1 offspring for each parent, where it should produce 2.
2) The tree generation method 'Ramped Half/Half', in its current state, produces a 50/50 split of Full and Grow
methods, not a graduated ramp through all depths.
3) Karoo GP currently engages a bloat inhibitor, that is, an upper limit on tree depth which is maintained through
all modes of mutation and crossover. This will become a user defined setting such that it can be adjusted, enabling
growth of trees beyond the original, user defined limit.
4) Karoo GP will be made to launch as a single, command-line function with all required parameters included, SciKit
Learn style.
2015 12/23 - version 0.9.1.1
It was discovered that when loading external datasets, Karoo was yet extracting variables (terminals) from the data in
the files/ directory, according to the selected kernel.
This is fixed.
2015 11/04 - version 0.9.1.0
Initial development of Karoo GP began in February 2015, on a Python-based evolutionary algorithm for an MSc research
project at the University of Cape Town (UCT) / African Institute for Mathematical Sciences (AIMS) and the Square
Kilometer Array (SKA). The myriad debug statements evolved into the user interface while the classic Machine Learning
test cases became the built-in example runs.
In the course of six months development, the code base grew to become a flexible, easy-to-use platform for Genetic
Programming.
Karoo GP has been thoroughly tested on a 40-core server at the Square Kilometer Array offices in Cape Town, South
Africa, where for one month it chewed through 10,000 rows of data for up to 50 hours without incident. It is proved
as a fully functional, multi-core workhorse.
With all development to date conducted locally, this version 0.9 marks the first release to github.
This initial github release is private, shared with select collaborators only. Please do not distribute any part of
the code until it is made public.
Kai Staats
www.kaistaats.com/research/
www.kaistaats.com/film/