karoo_gp/RELEASE_NOTES.txt

273 lines
12 KiB
Plaintext
Raw Normal View History

2016-09-19 20:15:24 -06:00
2016 09/19b
After another 2 hours of trouble shooting, I learned that sympy.subs throws the 'zoo' error for a divide-by-zero if
working with integers--and stalls (as I witnessed last year when developing Karoo), but if throws 'inf' or '-inf' if
working with floats and continues to process unencumbered. This means the 'zoo' trap has not been used for over a year.
It is now removed from the method fx_eval_subs().
2016-09-19 01:13:59 -06:00
2016 09/19
All experiments with lambdify are on hold as it is throwing 'divide by zero' errors when no 0 exists in the data.
In the evaluation of this issue, I returned to the means by which I deal with 'divide by zero', independent of the
library used to process the trees. Originally, 'result' was replaced by the value 1. But a colleague suggested that I
might be tossing genetic material of value to future generations. So, now a tree which produces a 'divide by zero'
error ('zoo' in .subs) is retained, but given a fitness score of 0.
Method fx_eval_subs() is updated to assign an 'error' tag to any tree which exhibits 'divide by zero' behaviour.
Method fx_fitness_eval() is updated to bypass processing of a tree which has been tagged with an 'error' tag.
Methods fx_test_classify(), fx_test_regress(), and fx_test_match() are each updated to send the 'tree_id' to
fx_eval_subs() as is now required.
For now, it is not recommended that you use the lambdify function in fx_eval_subs(), unless you know how to fix it.
2016-09-19 01:13:59 -06:00
2016-09-16 15:53:14 -06:00
2016 09/16
With the 09/14 update I failed to upload the new coefficients.csv file to the files/ directory. While not yet engaged,
this will be the means by which the user can define the constants desired for the Karoo GP run. If you had run Karoo GP
v0.9.2.0 in the past 24 hours without this file, it would have complained. My apology.
Also, a bit of a roadmap for the 2nd half of 2016, into 2017
- validate the new (faster) sympy.lambdify and fully replace the current (slower) sympy.subs
- replace the row-by-row dictionaries with vectors for what should be a significant performance increase
- complete the introduction of constants in a manner more well defined than is currently supported
- investigate replacing pprocess with the multicore library
- introduce Theano or Tensor Flow for GPU support
I welcome any assistance with these, if anyone has experience and time.
2016-09-14 19:41:13 -06:00
2016 09/14 - version 0.9.2.0
In karoo_gp_base_class.py
2016-09-19 01:13:59 -06:00
- Removed redundant lines in the method 'fx_karoo_data_load()'
2016-09-14 19:41:13 -06:00
- Added support for the Sympy 'lambdify' function in 'fx_karoo_data_load' (see explanation below)
- Added a draft means of catching divide-by-zero errors in the new 'lambdify' function
- Discovered the prior 'fx_eval_subs' incorrected applied a value of 1 to the variable 'result' as a means to
replace the 'zoo' function for divide by zero errors. However, this could inadvertantly undermine the success of
Classification and Regression runs. My apology for not catching this sooner.
"While attending the CHEAPR 2016 workshop hosted by the Center for Cosmology and Astro-Particle Physics, The Ohio State
2016-09-19 01:13:59 -06:00
University, Michael Zevin of Northwestern University proposed that Karoo GP *should* be able to process trees far faster
than what we were seeing. I looked into the Sympy functions I was at that time using. Indeed, '.subs' is noted as easy
to use, but terribly slow as it relies upon an internal, Python mathematical library. I therefore replaced '.subs' with
2016-09-14 19:41:13 -06:00
'.lambdify' which calls upon the C-based Numpy maths library. It is slated to be 500x faster than '.subs', but I am
seeing only a 2x performance increase. Clearly, there are yet other barriers to remove.
In the new 'fx_eval_subs' method you will find both sympy.subs (active) and sympy.lambdify. While preliminary tests
worked well, I witnessed an erractic outcome which I yet need to reproduce and investigate. Feel free to comment the
.subs and uncomment the .lambdify sections and take it for a spin.
I believe there are 2 more steps to increase performance: removing the dictionaries which contain each row, such that
Karoo is working directly with the Numpy array again, and then processing the array as a vector instead. But this will
require substantial recoding.
I'll keep you informed ..." --kai
2016-08-10 00:09:13 -06:00
2016 08/08 - version 0.9.1.9
In karoo_gp_base_class.py
2016-09-19 01:13:59 -06:00
- Created a new method 'fx_eval_subs()' which conducts the SymPy subs function for both train and test data.
2016-08-10 00:09:13 -06:00
- Consolidated duplicate SymPy sub calls in both Train and Test methods, at Erik H. suggestion --thank you!
- Consolidated duplicate lines into single lines in both Train and Test methods.
- Fixed bug in which Ramped 50/50 trees would not print in Play mode; removed Ramped 50/50 option from Play mode as
it does not make sense for a single Tree
2016-07-24 23:02:12 -06:00
2016 07/24 - version 0.9.1.8b
Sorry. Switched all references to "LOGIC" back to "BOOLEAN". Don't ask.
2016-07-18 07:25:47 -06:00
2016 07/18 - version 0.9.1.8
In karoo_gp_base_class.py
- removed some programmer's comments from experiments resolved
- modified the screen output for the multi-class classifier such that it is more clear that p (a) is actually
predicted (actual) with the addition of the tag "label"
2016-08-10 00:09:13 -06:00
2016 07/15 - version 0.9.1.7
2016-07-15 00:02:28 -06:00
In karoo_gp_base_class.py
- Ramped Half/Half, as originally designed by John Koza is now implemented!
- this offers a greater diversity of trees in the initial population
- added *** Crossover *** to this method for debug display mode
2016 07/14 - version 0.9.1.6b
In karoo_gp_base_class.py
- fixed 'debug' display output for branch mutation
- fixed a few bugs in branch mutation that reduced evolutionary efficiency
- updated all ERROR messages to be uniform in output
- added (y/n) to "Are you certain you want to quit?" message --thanks Hunter!
In karoo_gp_main.py
- reset default evoluationary balance to .1/.1/.1/.7
2016-07-13 22:52:42 -06:00
2016 07/13 - version 0.9.1.6
In karoo_gp_base_class.py
- enabled auto counting of labels (right column) in loaded data (data_y)
- re-ordered presentation of screen modes, in order of decreasing feedback [i, g, m, s]
- added argparse to further enhance karoo_gp_server.py as a scriptable tool --FINALLY! :)
In karoo_gp_main.py
- removed user query for number of class labels
- re-ordered presentation of screen modes, in order of decreasing feedback [i, g, m, s]
In karoo_go_server.py
- re-ordered presentation of screen modes, in order of decreasing feedback [i, g, m, s]
- argparse to further enhance karoo_gp_server.py as a scriptable tool
2016 07/11 - version 0.9.1.5
In karoo_gp_base_class.py
- renamed kernel (fitness function) type from 'a' to 'r' for regression
- renamed all associated variable and methods to match switch from 'abs_value' to 'regression'
- reorganised validation methods to match order of fitness functions
- added [other] place holder to validation methods
In karoo_gp_main.py
- renamed kernel (fitness function) type from 'a' to 'r' for regression
- adjusted upper boundary for qty of generations from 1000 to 100
In karoo_go_server.py
- renamed kernel (fitness function) type from 'a' to 'r' for regression
In files/ data_ABS.csv and functions_ABS.csv were rename data_REGRESS.csv and functions_REGRESS.csv accordingly.
2016 07/10-11 - version 0.9.1.4
In karoo_gp_base_class.py
- renamed variable 'tree_depth_max' to 'tree_depth_base'
- renamed variable 'tree_depth_adj' to 'tree_depth_max'
- renamed variable 'gp.pop_tree_depth_max' to 'gp.pop_tree_depth_base'
- enabled 2 children to be produced by each Crossover function
- extensive testing of Crossover in debug to validate process
- reduced # of crossover functions per run by 1/2
- improved on-screen output for all 4 genetic operators
- removed not-in-use 'accuracy' test
- added 'Arguments required:' to each method notes
- edited a number of method notes
In karoo_gp_main.py
- renamed variables (according to karoo_gp_base_class.py)
In karoo_go_server.py
- renamed variables (according to karoo_gp_base_class.py)
2016 07/08-09 - version 0.9.1.3
In karoo_gp_base_class.py
- added CTRL-C catch to the (pause) menu; removes potential to accidentally kill a run when attempting to
copy/paste an on-screen function to research notes (use the mouse instead).
- rebuilt each (pause) menu function for improved exception handling
- added the new gp.tree_depth_adj user defined variable to branch mutation and crossover, enabling Trees to grow
beyond their original size which adds opportunity for more complex solutions, as well as the unavoidable bloat
- reduced complexity of a few lines in both branch mutation and crossover methods
- tested, tested, tested
In karoo_gp_main.py
- added CTRL-C catch to the at-launch user menu.
- renamed the method gp.fx_karoo_crossover_reproduction() to gp.fx_karoo_crossover()
- added user input for the new global variable gp.tree_depth_adj
In karoo_go_server.py
- added new gp.tree_depth_adj variable
2016 07/07 - version 0.9.1.2
In preparation for public launch of Karoo GP, a number of updates are complete or underway.
The Quick Start Tutorial is being fully revised. A number of corrections were made, but more importantly, all new
content has been added relevant to preparation of datasets and the use of the Karoo GP Tools accordingly. The genetic
operators descriptions now feature visuals and revised descriptions, as to many other sections.
In the karoo_gp/tools/ directory, all scripts have undergone updates, 2 of which now offer automated scaling and a
user interface that in the original versions were not present, as follows:
karoo_data_sort.py (formerly karoo_features_sort.py)
This script now engages the user with a query for the number of class labels and the number of data points (rows)
for the new, randomly generated subset of the parent dataset. This script is designed to be used prior to
karoo_normalise.py.
karoo_normalise.py
This script now auto-scales to any number of columns and rows (within the limit of your computer's capability),
and features a text-based user interface. This script is designed to be used following karoo_data_sort.py.
karoo_multiclassifier.py
This script functions as before, but with a minor bug fixed in which the final class was mislabeled.
karoo_iris_plot.py
This script functions as before, but with improved in-script documentation and cleaner code.
In development now are a number of updates and improvements to the base_class such that Karoo GP will more readily
conform to the GP standards, as follows:
1) Karoo GP currently produces only 1 offspring for each parent, where it should produce 2.
2) The tree generation method 'Ramped Half/Half', in its current state, produces a 50/50 split of Full and Grow
methods, not a graduated ramp through all depths.
3) Karoo GP currently engages a bloat inhibitor, that is, an upper limit on tree depth which is maintained through
all modes of mutation and crossover. This will become a user defined setting such that it can be adjusted, enabling
growth of trees beyond the original, user defined limit.
4) Karoo GP will be made to launch as a single, command-line function with all required parameters included, SciKit
Learn style.
2015 12/23 - version 0.9.1.1
It was discovered that when loading external datasets, Karoo was yet extracting variables (terminals) from the data in
the files/ directory, according to the selected kernel.
This is fixed.
2015 11/04 - version 0.9.1.0
Initial development of Karoo GP began in February 2015, on a Python-based evolutionary algorithm for an MSc research
project at the University of Cape Town (UCT) / African Institute for Mathematical Sciences (AIMS) and the Square
Kilometer Array (SKA). The myriad debug statements evolved into the user interface while the classic Machine Learning
test cases became the built-in example runs.
In the course of six months development, the code base grew to become a flexible, easy-to-use platform for Genetic
Programming.
Karoo GP has been thoroughly tested on a 40-core server at the Square Kilometer Array offices in Cape Town, South
Africa, where for one month it chewed through 10,000 rows of data for up to 50 hours without incident. It is proved
as a fully functional, multi-core workhorse.
With all development to date conducted locally, this version 0.9 marks the first release to github.
This initial github release is private, shared with select collaborators only. Please do not distribute any part of
the code until it is made public.
Kai Staats
www.kaistaats.com/research/
www.kaistaats.com/film/