Symbolic Regression

This is an open and independent investigation around the following question (see also this):

Can propositional theories be evolved?

Can evolution, through natural selection, select a theory that fits a set of un-anticipated but observed common sense (in the form of propositional facts) and make predictions that fit unobserved events?

For example, suppose we lived in a world with the following propositions:

  • PersonInFrontOfCar
  • YellowLight
  • RedLight
  • Policeman
  • Policecar
  • Slippery
  • Snow
  • Dry
  • Brake

And a large supply of observable situations of how these appear together, like:

  • YellowLight.
  • ~RedLight.
  • ~Snow.
  • Dry.
  • Policecar.
  • ~PersonInFrontOfCar.
  • Brake.

Can a genetic algorithm evolve and select the following un-announced patterns to make predictions of future events (exclusively from the observations)?

  • PersonInFrontOfCar => Brake.
  • (YellowLight && Policeman && ~Slippery) => Brake.
  • Policecar => Policeman.
  • Snow => Slippery.
  • Slippery => ~Dry.
  • RedLight => Brake.
  • Winter => Snow.

Genetic Programming

Suppose our individuals are propositional theories, in the form of formulas, which start at random configurations. For example, each individual looks more or less like this:

  • Individual 0: Winter => Dry. Policeman => Brake.
  • Individual 1: Winter => ~Dry. Dry => Snow.
  • Individual 2: YellowLight => ~Dry && Snow.
  • ...
  • Individual n: RedLight => YelloLight.

Each individual is evaluated by the degree they support the evidence: how many factual observations fit into the theory. For each situation, each theory is tested to see if it explains it.

For example, suppose we in situation 0 we observe Winter. ~Dry.. The theory of Individual 0 directly conflicts with this evidence, making it less useful evolutionarily speaking. The theory Individual 1, on the other hand, supports the evidence, making it more fit.

If natural selection occurred through a number of generations, would we arrive at?

  • PersonInFrontOfCar => Brake.
  • (YellowLight && Policeman && ~Slippery) => Brake.
  • Policecar => Policeman.
  • Snow => Slippery.
  • Slippery => ~Dry.
  • RedLight => Brake.
  • Winter => Snow.

Marginal Costs

Arriving at that alone isn't sufficient: the algorithm has to work for any input without any modification.

For example, if we changed the observations to a completely different scenario, can a foundational genetic algorithm, with zero marginal costs (i.e. without any recompilation cycle), derive theories that fit any propositional evidence?

For example, given:

  • cat_fur || dog_fur.
  • ~thompson_allergy.
  • macavity_criminal.

Can the algorithm derive the following theory without any recompilation cycle from the past theory?

  • dog_fur => thompson_allergy.
  • cat_fur => macavity_criminal.

Related Work

Open Questions

  • Would it work for first order logic too?
  • Would it be able to derive taxonomies?
  • Would it be able to deal with correlations?
  • Would it derive rules that can't be understood because they are missing terms?
  • Would the search space be so big that it would need guidance on where to look at?
  • What are the algorithmic complexity of this? It it proportional to the number of observations? Complexity of the rules?

Interested? Find me!