numpy - Python method changing all key value pairs in dictionary -
i in process of building little python script learns play tic tac toe. process store each move made in game , score move based on whether lead winning outcome. attempt train on many rounds of play.
my problem lies update_weights()
method. expect take stored moves (accessed board object , represented list [row,col]) , iterate through list of moves. method should reference board's stored weights (a dictionary of (3,3) numpy arrays) , update corresponding weight appropriate move.
e.g. assume win occurred. in winning sequence move #2 @ board position [0,1]. method should access dictionary of weights (keys move #) , multiply position of array [0,1] factor of 1.05.
the problem method changing arrays in weight dictionary, not 1 associated correct move # key. can't figure out how happening.
import numpy np import random class ttt_board(): def __init__(self): self.board_state = np.array([[0,0,0],[0,0,0],[0,0,0]]) self.board_weight = self.reset_board_weights() self.moves = [] def reset_board_weights(self): board_weight_instance = np.zeros((3,3)) board_weight_instance[board_weight_instance >= 0] = 0.5 board_weight = {0: board_weight_instance, 1: board_weight_instance, 2: board_weight_instance, 3: board_weight_instance, 4: board_weight_instance} return board_weight def reset_board(self): self.board_state = np.array([[0,0,0],[0,0,0],[0,0,0]]) def reset_moves(self): self.moves = [] def is_win(self): board = self.board_state if board.trace() == 3 or np.flipud(board).trace() == 3: return true in range(3): if board.sum(axis=0)[i] == 3 or board.sum(axis=1)[i] == 3: return true else: return false def is_loss(self): board = self.board_state if board.trace() == 12 or np.flipud(board).trace() == 12: return true in range(3): if board.sum(axis=0)[i] == 12 or board.sum(axis=1)[i] == 12: return true else: return false def is_tie(self): board = self.board_state board_full = true in range(len(board)): k in range(len(board)): if board[i][k] == 0: board_full = false if board_full , not self.is_win() , not self.is_loss(): return true else: return false def update_board(self,player,space): #takes player 1 or 4 #takes space list [0,0] self.board_state[space[0],space[1]] = player if player == 1: self.store_move(space) return def get_avail_spots(self): avail_spots = [] board = self.board_state in range(len(board)): k in range(len(board)): if board[i][k] == 0: avail_spots.append([i,k]) return avail_spots def gen_next_move(self): avail_spots = self.get_avail_spots() move = random.randrange(len(avail_spots)) return avail_spots[move] def update_weights(self,win): moves = self.moves if win: factor = 1.05 else: factor= 0.95 in range(len(moves)): row = moves[i][0] col = moves[i][1] old_weight = self.board_weight[i][row,col] new_weight = old_weight*factor self.board_weight[i][row,col] = new_weight return def store_move(self,move): self.moves.append(move) return if __name__ == '__main__': board = ttt_board() while not board.is_win() , not board.is_loss() , not board.is_tie(): try: board.update_board(1,board.gen_next_move()) board.update_board(4,board.gen_next_move()) except valueerror: break if board.is_win(): board.update_weights(1) print('player 1 wins: {w}'.format(w=board.is_win())) elif board.is_loss(): board.update_weights(0) print('player 2 wins: {l}'.format(l=board.is_loss())) elif board.is_tie(): print('game ends in tie: {t}'.format(t=board.is_tie())) print('here final board') print(board.board_state) print(board.board_weight) print(board.moves)
as can see running script, printed dictionary of weights after single game has identical array values each key. i expect each array changed in 1 position should accessed move # corresponding key associated with.
the problem share same reference on board_weight_instance
array in dictionary
board_weight_instance = np.zeros((3,3)) board_weight_instance[board_weight_instance >= 0] = 0.5 board_weight = {0: board_weight_instance, 1: board_weight_instance, 2: board_weight_instance, 3: board_weight_instance, 4: board_weight_instance}
i in dictionary comprehension, creating new reference each element using helper method:
@staticmethod def create_element(): board_weight_instance = np.zeros((3,3)) board_weight_instance[:] = 0.5 # simpler method return board_weight_instance board_weight = {i:self.create_element() in range(0,5)}
in case, why using dictionary when can use list
: no hashing, faster processing:
board_weight = [self.create_element() _ in range(0,5)]
you can access same way
Comments
Post a Comment