v1.0 with GPU support

pull/6/head
Kai Staats 2017-02-07 01:33:38 -07:00
parent 0fea36b252
commit d8f460b43c
4 changed files with 807 additions and 766 deletions

View File

@ -1,3 +1,56 @@
2017 02/06
Graphics Processing Units (GPU) are now supported with the introduction of the Python library TensorFlow. The end
result is a staggering improvement in performance. With one comparison of a 10,000 data points (rows) x 9 features
(columns) dataset on a 40 core Intel Xeon motherboard versus a 2000 core Nvidia GPU card, the wall time as reduced from
50 hours to less than 4 minutes. On CPU-only computers, the performance on a single core is as much as 10x improved due
to the vectorisation of the data and application of the C-based TensorFlow maths library.
To install TensorFlow, I recommend visiting https://www.tensorflow.org/get_started/ It is straight forward for Ubuntu,
but unfortunately can be rather challenging with OSX. Have patience. Review the forums. It's worth the effort.
I owe many thanks to the expertise of Iurii Milovanov, a contract developer whom I engaged for this effort. While the
number of lines of Karoo GP modified were initially less than a dozen (replacing the multi-core pprocess calls), I asked
Iurii to also rewrite the test functions. As such, both Training and Testing are now fully GPU enabled. Thank you!
A number of other changes have been integrated, including:
- Karoo GP is now developed against Python 2.7 as provided with Ubuntu Desktop 16.04.1.
- A number of Python methods have been deleted, added, modified and/or renamed. In particular, in the category
'fx_fitness_' If you have built your own code based on the Karoo methods, please review this section carefully.
- The user engaged 'bal'ance function (pause menu) has been rebuilt to anticipate exact quantities instead of
percentages, enabling the user to define precisely how many of each of the four genetic operators will be applied
with the construction of each subsequent population.
- Activation of the 'test' is now conducted with only the letter 't' and the option to engage a specific number of
'c'ores is removed. Therefore, the 't'imer mode is also removed, as this was a means to discover the optimal number
for multi-core processing which is now automated by TensorFlow.
- The libraries 'pprocess' and 'time' are no longer required nor imported.
- The population_* files (.csv) are now deposited into unique directories created with the launch of each run. A .txt
file is also written to each directory which captures the run-time configuration of Karoo GP. This enables truly
scriptable runs of Karoo.
- The Server interface to Karoo GP (karoo_gp_server.py) now terminates completely, kicking back to the command line.
This enables bash or chron launches of multiple sequential or parallel runs, enabling the exploration of multiple
runs with identical configuration, or that of varied configuration parameters.
Finally, Karoo GP is now a 1.0 release. I never know when to transition from beta to real, so please forgive me if I
jumped the gun. But with GPU support and the revised Server script, I have find Karoo GP to be a fully functional,
powerful machine learning tool. I hope you will agree --kai
2016 09/20
Fixed the genetic operator (b)alance function to work with large than 100 trees per population.
Introduced the pause for all runtime modes in the Desktop application, such that the user can apply configurations prior
to the run (eg: change the balance of the genetic operators or the number of engaged cores).
2016 09/19b
After another 2 hours of trouble shooting, I learned that sympy.subs throws the 'zoo' error for a divide-by-zero if
@ -31,11 +84,11 @@ With the 09/14 update I failed to upload the new coefficients.csv file to the fi
this will be the means by which the user can define the constants desired for the Karoo GP run. If you had run Karoo GP
v0.9.2.0 in the past 24 hours without this file, it would have complained. My apology.
Also, a bit of a roadmap for the 2nd half of 2016, into 2017
Also, a bit of a road map for the 2nd half of 2016, into 2017
- validate the new (faster) sympy.lambdify and fully replace the current (slower) sympy.subs
- replace the row-by-row dictionaries with vectors for what should be a significant performance increase
- complete the introduction of constants in a manner more well defined than is currently supported
- investigate replacing pprocess with the multicore library
- investigate replacing pprocess with the multi-core library
- introduce Theano or Tensor Flow for GPU support
I welcome any assistance with these, if anyone has experience and time.
@ -47,8 +100,8 @@ In karoo_gp_base_class.py
- Removed redundant lines in the method 'fx_karoo_data_load()'
- Added support for the Sympy 'lambdify' function in 'fx_karoo_data_load' (see explanation below)
- Added a draft means of catching divide-by-zero errors in the new 'lambdify' function
- Discovered the prior 'fx_eval_subs' incorrected applied a value of 1 to the variable 'result' as a means to
replace the 'zoo' function for divide by zero errors. However, this could inadvertantly undermine the success of
- Discovered the prior 'fx_eval_subs' uncorrectly applied a value of 1 to the variable 'result' as a means to
replace the 'zoo' function for divide by zero errors. However, this could inadvertently undermine the success of
Classification and Regression runs. My apology for not catching this sooner.
"While attending the CHEAPR 2016 workshop hosted by the Center for Cosmology and Astro-Particle Physics, The Ohio State
@ -59,7 +112,7 @@ to use, but terribly slow as it relies upon an internal, Python mathematical lib
seeing only a 2x performance increase. Clearly, there are yet other barriers to remove.
In the new 'fx_eval_subs' method you will find both sympy.subs (active) and sympy.lambdify. While preliminary tests
worked well, I witnessed an erractic outcome which I yet need to reproduce and investigate. Feel free to comment the
worked well, I witnessed an erratic outcome which I yet need to reproduce and investigate. Feel free to comment the
.subs and uncomment the .lambdify sections and take it for a spin.
I believe there are 2 more steps to increase performance: removing the dictionaries which contain each row, such that
@ -111,7 +164,7 @@ In karoo_gp_base_class.py
- added (y/n) to "Are you certain you want to quit?" message --thanks Hunter!
In karoo_gp_main.py
- reset default evoluationary balance to .1/.1/.1/.7
- reset default evolutionary balance to .1/.1/.1/.7
@ -214,7 +267,7 @@ user interface that in the original versions were not present, as follows:
This script now auto-scales to any number of columns and rows (within the limit of your computer's capability),
and features a text-based user interface. This script is designed to be used following karoo_data_sort.py.
karoo_multiclassifier.py
karoo_multi-classifier.py
This script functions as before, but with a minor bug fixed in which the final class was mislabeled.
karoo_iris_plot.py

File diff suppressed because it is too large Load Diff

View File

@ -1,8 +1,8 @@
# Karoo GP Main (desktop)
# Use Genetic Programming for Classification and Symbolic Regression
# by Kai Staats, MSc UCT / AIMS; see LICENSE.md
# Much thanks to Emmanuel Dufourq and Arun Kumar for their support, guidance, and free psychotherapy sessions
# version 0.9.2.1
# by Kai Staats, MSc; see LICENSE.md
# Thanks to Emmanuel Dufourq and Arun Kumar for support during 2014-15 devel; TensorFlow support provided by Iurii Milovanov
# version 1.0
'''
A word to the newbie, expert, and brave--
@ -34,6 +34,7 @@ If you include the path to an external dataset, it will auto-load at launch:
import sys # sys.path.append('modules/') to add the directory 'modules' to the current path
import karoo_gp_base_class; gp = karoo_gp_base_class.Base_GP()
#++++++++++++++++++++++++++++++++++++++++++
# User Defined Configuration |
#++++++++++++++++++++++++++++++++++++++++++
@ -50,10 +51,10 @@ gp.karoo_banner()
print ''
menu = ['b','r','c','m','p','']
menu = ['c','r','m','p','']
while True:
try:
gp.kernel = raw_input('\t Select (r)egression, (c)lassification, (m)atching, or (p)lay (default m): ')
gp.kernel = raw_input('\t Select (c)lassification, (r)egression, (m)atching, or (p)lay (default m): ')
if gp.kernel not in menu: raise ValueError()
gp.kernel = gp.kernel or 'm'; break
except ValueError: print '\t\033[32m Select from the options given. Try again ...\n\033[0;0m'
@ -139,7 +140,7 @@ else: # if any other kernel is selected
except ValueError: print '\t\033[32m Enter a number from 1 including 100. Try again ...\n\033[0;0m'
except KeyboardInterrupt: sys.exit()
menu = ['i','g','m','s','db','t','']
menu = ['i','g','m','s','db','']
while True:
try:
gp.display = raw_input('\t Display (i)nteractive, (g)eneration, (m)iminal, or (s)ilent (default m): ')
@ -150,13 +151,12 @@ else: # if any other kernel is selected
# define the ratio between types of mutation, where all sum to 1.0; can be adjusted in 'i'nteractive mode
gp.evolve_repro = int(0.1 * gp.tree_pop_max) # percentage of subsequent population to be generated through Reproduction
gp.evolve_point = int(0.1 * gp.tree_pop_max) # percentage of subsequent population to be generated through Point Mutation
gp.evolve_branch = int(0.1 * gp.tree_pop_max) # percentage of subsequent population to be generated through Branch Mutation
gp.evolve_cross = int(0.7 * gp.tree_pop_max) # percentage of subsequent population to be generated through Crossover
gp.evolve_repro = int(0.1 * gp.tree_pop_max) # quantity of a population generated through Reproduction
gp.evolve_point = int(0.1 * gp.tree_pop_max) # quantity of a population generated through Point Mutation
gp.evolve_branch = int(0.1 * gp.tree_pop_max) # quantity of a population generated through Branch Mutation
gp.evolve_cross = int(0.7 * gp.tree_pop_max) # quantity of a population generated through Crossover
gp.tourn_size = 10 # qty of individuals entered into each tournament (standard 10); can be adjusted in 'i'nteractive mode
gp.cores = 1 # replace '1' with 'int(gp.core_count)' to auto-set to max; can be adjusted in 'i'nteractive mode
gp.precision = 4 # the number of floating points for the round function in 'fx_fitness_eval'; hard coded
@ -182,8 +182,8 @@ gp.fx_karoo_construct(tree_type, tree_depth_base) # construct the first populati
if gp.kernel != 'p': print '\n We have constructed a population of', gp.tree_pop_max,'Trees for Generation 1\n'
else: # EOL for Play mode
gp.fx_eval_tree_print(gp.tree) # print the current Tree
gp.fx_tree_archive(gp.population_a, 'a') # save this one Tree to disk
gp.fx_display_tree(gp.tree) # print the current Tree
gp.fx_archive_tree_write(gp.population_a, 'a') # save this one Tree to disk
sys.exit()
@ -206,11 +206,13 @@ if gp.display != 's':
if gp.display == 'i': gp.fx_karoo_pause(0)
gp.fx_fitness_gym(gp.population_a) # 1) extract polynomial from each Tree; 2) evaluate fitness, store; 3) display
gp.fx_tree_archive(gp.population_a, 'a') # save the first generation of Trees to disk
gp.fx_archive_tree_write(gp.population_a, 'a') # save the first generation of Trees to disk
# no need to continue if only 1 generation or fewer than 10 Trees were designated by the user
if gp.tree_pop_max < 10 or gp.generation_max == 1:
gp.fx_karoo_eol(); sys.exit()
gp.fx_archive_params_write('Desktop') # save run-time parameters to disk
gp.fx_karoo_eol()
sys.exit()
#++++++++++++++++++++++++++++++++++++++++++
@ -238,14 +240,14 @@ for gp.generation_id in range(2, gp.generation_max + 1): # loop through 'generat
gp.fx_karoo_crossover() # method 4 - Crossover Reproduction
gp.fx_eval_generation() # evaluate all Trees in a single generation
gp.population_a = gp.fx_evo_pop_copy(gp.population_b, ['GP Tree by Kai Staats, Generation ' + str(gp.generation_id)])
gp.population_a = gp.fx_evolve_pop_copy(gp.population_b, ['GP Tree by Kai Staats, Generation ' + str(gp.generation_id)])
#++++++++++++++++++++++++++++++++++++++++++
# "End of line, man!" --CLU |
#++++++++++++++++++++++++++++++++++++++++++
gp.fx_tree_archive(gp.population_b, 'f') # save the final generation of Trees to disk
gp.fx_archive_tree_write(gp.population_b, 'f') # save the final generation of Trees to disk
gp.fx_karoo_eol()

View File

@ -1,8 +1,8 @@
# Karoo GP Server
# Use Genetic Programming for Classification and Symbolic Regression
# by Kai Staats, MSc UCT / AIMS; see LICENSE.md
# Much thanks to Emmanuel Dufourq and Arun Kumar for their support, guidance, and free psychotherapy sessions
# version 0.9.2.1
# by Kai Staats, MSc; see LICENSE.md
# Thanks to Emmanuel Dufourq and Arun Kumar for support during 2014-15 devel; TensorFlow support provided by Iurii Milovanov
# version 1.0
'''
A word to the newbie, expert, and brave--
@ -14,7 +14,7 @@ of its intent and design.
KAROO GP SERVER
This is the Karoo GP server application. It can be internally scripted, fully command-line configured, or a combination
of both. If this is your first time using Karoo GP, please run the desktop application karoo_gp_main.py first in order
that you come to understand its full functionality.
that you come to understand the full functionality of this particular Genetic Programming platform.
To launch Karoo GP server:
@ -52,18 +52,18 @@ import argparse
import karoo_gp_base_class; gp = karoo_gp_base_class.Base_GP()
ap = argparse.ArgumentParser(description = 'Karoo GP Server')
ap.add_argument('-ker', action = 'store', dest = 'kernel', default = 'm', help = '[r,c,m] fitness function: (r)egression, (c)lassification, or (m)atching')
ap.add_argument('-ker', action = 'store', dest = 'kernel', default = 'm', help = '[c,r,m] fitness function: (r)egression, (c)lassification, or (m)atching')
ap.add_argument('-typ', action = 'store', dest = 'type', default = 'r', help = '[f,g,r] Tree type: (f)ull, (g)row, or (r)amped half/half')
ap.add_argument('-bas', action = 'store', dest = 'depth_base', default = 3, help = '[3...10] maximum Tree depth for the initial population')
ap.add_argument('-max', action = 'store', dest = 'depth_max', default = 3, help = '[3...10] maximum Tree depth for the entire run')
ap.add_argument('-min', action = 'store', dest = 'depth_min', default = 3, help = '[3...100] minimum number of nodes')
ap.add_argument('-pop', action = 'store', dest = 'pop_max', default = 100, help = '[10...1000] maximum population')
ap.add_argument('-gen', action = 'store', dest = 'gen_max', default = 10, help = '[1...100] number of generations')
ap.add_argument('-fil', action = 'store', dest = 'filename', default = 'files/data_MATCH.csv', help = '/path/to_your/data.csv')
ap.add_argument('-fil', action = 'store', dest = 'filename', default = 'files/data_MATCH.csv', help = '/path/to_your/[data].csv')
args = ap.parse_args()
# set the same parameters found in the Karoo GP desktop application, but potentially passed from the command line
# pass the argparse defaults and/or user inputs to the required variables
gp.kernel = str(args.kernel)
tree_type = str(args.type)
tree_depth_base = int(args.depth_base)
@ -73,14 +73,13 @@ gp.tree_pop_max = int(args.pop_max)
gp.generation_max = int(args.gen_max)
filename = str(args.filename)
gp.display = 'n' # display mode is set to (s)ilent
gp.evolve_repro = int(0.1 * gp.tree_pop_max) # percentage of subsequent population to be generated through Reproduction
gp.evolve_point = int(0.1 * gp.tree_pop_max) # percentage of subsequent population to be generated through Point Mutation
gp.evolve_branch = int(0.2 * gp.tree_pop_max) # percentage of subsequent population to be generated through Branch Mutation
gp.evolve_cross = int(0.6 * gp.tree_pop_max) # percentage of subsequent population to be generated through Crossover
gp.display = 's' # display mode is set to (s)ilent
gp.evolve_repro = int(0.1 * gp.tree_pop_max) # quantity of a population generated through Reproduction
gp.evolve_point = int(0.0 * gp.tree_pop_max) # quantity of a population generated through Point Mutation
gp.evolve_branch = int(0.2 * gp.tree_pop_max) # quantity of a population generated through Branch Mutation
gp.evolve_cross = int(0.7 * gp.tree_pop_max) # quantity of a population generated through Crossover
gp.tourn_size = 10 # qty of individuals entered into each tournament (standard 10); can be adjusted in 'i'nteractive mode
gp.cores = 1 # replace '1' with 'int(gp.core_count)' to auto-set to max; can be adjusted in 'i'nteractive mode
gp.precision = 4 # the number of floating points for the round function in 'fx_fitness_eval'; hard coded
# run Karoo GP