Sunday, December 27, 2020

One spider leg, dancing

I am claiming the points for having completed my experiment with machine-learning AI doing the dull parts of programming the movements of a single spider's leg.  
I ran something over 3000 leg movement actions (only a proof of concept, not enough to get any real skill learned in) directed by a reinforcement learning agent using a tensor-flow based SARSA agent.
Most of the code came from Gelana Tostaeva in an article she published through Medium named 

Learning with Deep SARSA & OpenAI Gym.

After that 3000 steps, I learned something important.  The agent was learning from it's experience, using my 4 servo spider leg, and the python code I wrote to represent the spider leg as an Open AI Gym environment.  My reward code was allowing it to learn to make big dramatic gestures without actually overheating the servos. Mwahahahaaa.  
(There were a bunch of non-zero values in the Q matrix when I stopped, indicating some bits of learning, I believe) 
However, eventually the jerky, fast, full arc, drama of the gestures did overstress the servos and break the gears in one.  I expect it might eventually learn that it has to ease up to full speed and ease back to a stop.  However to save on costs, I will be futzing with the reward design a bit.  Thanks Gelana!

Sunday, December 13, 2020

Leg Agency

I have been rewriting some python code to be multiprocess so that it can be writing and reading from the serial port "at the same time".

That is in good shape now, though I worry a bit about how many processes I have broken the work up into.  There is a monitor process and then two for each serial port that has servos attached (one reads and the other writes).

I took sample code from "Create custom gym environments from scratch — A stock market example" by Adam King on Medium.  That gave me the methods required for a wrapper.  That code makes my multiprocess SpiderWhisperer comply with the gym environment API from OpenAI.  

In order to have an RL agent learn how to move my spider leg while only developing the sort of familiarity with actual machine-learning math that one gets from sharing a friendly nod across a noisy party, I stripped out pieces of sample code from the Gym tutorial here.  That gave me the rest of the code I needed for some conceptual experimental runs.

I hesitate to mention that there was some debugging.  Since I am intent on learning from the noble creatures in the phylum arthropoda, I will say that some adjustments to the implementation were required that I discovered by trying to run the RL scenario using code I shuffled together.

Now I am ready to try a long run, and see if an RL Agent that could learn to maneuver a cart up a mountain from a valley, can also learn to make interesting movements with a 4 servo spider leg, without burning out a lot of servos.

Sunday, November 1, 2020

Spider Leg exercises in the Gym

After doing some more thorough reading and experimentation with the tutorials, I have decided that the Google Deep Mind control components are more heavyweight than I need.  They have a dependency on a pay-to-license physics simulation engine that I don't need, mujoco.  So instead I am narrowing my list of components some, to deep mind acme (dm_acme), the deep mind environment (dm_env) package, and openai gym.  Together they provide a framework for easier implementation and a sample case of reinforcement learning agents.
ps: If you read this post aloud to someone, be thoughtful how you place the pauses when you say "google deep mind control".  

Friday, October 23, 2020

Deep Minded Acme Spiders

The Google Deep Mind projects include some demos and examples that sound like an excellent base for what I want to try for the SpiderDroid.  Now let us see if I can absorb enough new comp sci concepts to produce any interesting behaviors.

Tuesday, October 13, 2020

SpiderDroid Rewards

The core mechanism of using machine learning to "solve" games is building in an incentive factor that the machine learning algorithm will use to prioritize and identify the best values for the parameters it can control.  For my experiments I think I will need some external pure incentive and disincentive stimulus that can be used for any early assisted learning stages that are required.  This is because I expect the learning process that will connect actuator parameters with movement results will be full of dead ends, and I want to be able to try to back the ML algorithm out of them and encourage the patterns that look more productive.

However, it seems relevant to mention the unfortunate babbling idiocy that has eventually forced me to erase and restart all of the early (local brains only) voice to text translators I tried.

Caveat creator.  Do I need a kill switch?


I will know that I have reached a new plateau when my spider droids can learn movement on their own by experimentally moving whatever collection of active joints I have given them.  I expect that discovering the most productive movements and combinations of movements in terms of visibly and inertially detectable changes in position and location of a single eye cluster, and detectable changes in sound composition and volume at the single ear cluster will be the first incentivized results.  If I get that far, there will be others even more interesting.

Tuesday, May 12, 2020

SpiderDroid 0.02: 3D Models

I have published the 3D Models for the upper (and most interesting) pieces of the leg design of the 0.02 SpiderDroid in TinkerCad under easy free licensing.