Though about it some more, I'll go full autistic on this idea and what's the potential problem with it:
First of all the question of what kind of learning we use here, you can have raw input (basically a map with the units on screen), or have someone make that input into some hyper parameters (like have enemy clustering or number of small formations as a dedicated number).
Considering you want to have a player-dedicated AI (rather than one made from aggregated multiple users), learning raw input is completely infeasable unless you want the player to play thousands of games (since more free elements -> higher amount of things to optimize). And that's without going into the fact that we still struggle to make a car AI that doesn't randomly run over people. Things like formation is far harder to put into programming than the average person thinks.
So with this hyper parameters you now need a way to optimize them, which means a value function of win/tie/lose. Now the vast majority of players will not want to play a game they lose too much, so you can't start the game in a position the pc can curbstomp the player. So the best way to optimize it is by some heuristic like the amount of damage inflicted to the player. There are ton of heuristics and you can fuck yourself hard if you do it wrong. For example, if the game has a single point the AI needs to destroy to win the game, but the heuristic is amount of damage to building, then you'd have an AI that ignores the target goal and waste time destroying barricades (since more damage -> higher score). This also has another problem which I'll go into it later.
So a potential game can have multiple waves of zombies attacking the player, each wave with its own AI. This will give you some amount of sample to optimize on (which would take a millisecond between rounds), but here's another problem, if one wave wrecked the player's shit and the next one beats what's left of his forcees, then you'll have the latter wave be more important than the former, which is incorrect and will lead to wrong optimization. You also want the game to still be fun and functional, so you need to watch out of cases where the AI basically fucks off and the player needs to wait for something to happen.
Finally there's a user problem, which is summarized with this clip:
The player can basically cheese the entire process and have the shittiest AI be considered as the best one by the heuristic and completely break the game.