[MLAgents][Unity] 퍼포 더 코기 최신화

2018년인가..

ML-Agent 기능이 처음 나오고 나서 써보다가

일이 바빠서 잊고 있었는데 최근에 관심이 다시 생겨서

최신 버전으로 테스트를 좀 해보고 싶어졌습니다.

2018년 당시에 샘플 프로젝트였던
퍼포 더 코기 라는게 있었는데

https://blog.unity.com/kr/technology/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit

퍼포 더 코기 : Unity ML-에이전트 툴킷으로 탄생한 재롱둥이 | Unity Blog

게임 제작은 게임 컨셉 및 로직 정의, 에셋 및 애니메이션 제작, NPC 동작 지정, 난이도 및 밸런스 조정, 출시 전 실제 플레이어를 통한 게임 테스트 등 여러 까다로운 작업이 수반되는 창조적 과

blog.unity.com

아직 프로젝트 코드가 남아 있었고 (4년 가까이 지났는데..)

어차피 테스트용 3D 모델을 구해야 했는데 잘 됐다 싶어서

이 프로젝트를 최신화해서 머신러닝을

테스트해보는 것으로 가닥을 잡았습니다.

일단 기본적인 프로젝트를 생성합니다.

(유니티 2021.2.5f1 기준)

그다음 이전에 있던 프로젝트에서

필요한 리소스들을 (씬, 프리펩, 소스코드 같은 것들..)

가져옵니다.

물론 컴파일 에러가 바로 쏟아집니다.

예전에는 이런 아카데미와 브레인 개념이 있어서

아카데미를 만들고 브레인들을 세팅해서 학습 시켜주는 구조였는데

최신 버전에서는 이 구조가 바뀌고 그냥

Agent 정도만 구현해서 세팅해 주면 되는듯합니다.

(아주 간편해 졌습니다.)

그래서 일단 컴파일 에러가 나는 아카데미나 브레인 관련된

GameObject나 코드들은 지워 줍니다.

그다음

Window -> Package Manager 에서

필요한 패키지들을 추가해 줍니다.

ML Agents 추가.

TextMeshPro 추가

Cinemachine 추가

Post Processing 추가

그다음 아래와 같이 새로운 MLAgent을 임포트 해줍니다.

그리고 나머지 컴파일 에러 나는 것들을 최신화해 줍니다.

요런거.

또 요런거.

최신화된 DogAgent.cs의 코드

(에러의 거의 대부분이 DogAgent.cs여서 이것만 첨부합니다.)

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

using Unity.MLAgents;
using Unity.MLAgents.Sensors;
using Unity.MLAgents.Actuators;
using Unity.MLAgentsExamples;

public class DogAgent : Agent
{
    [HideInInspector]
    // The target the dog will run towards.
    public Transform target; 

    // These items should be set in the inspector
    [Header("Body Parts")] 
    public Transform mouthPosition;
    public Transform body;
    public Transform leg0_upper;
    public Transform leg1_upper;
    public Transform leg2_upper;
    public Transform leg3_upper;
    public Transform leg0_lower;
    public Transform leg1_lower;
    public Transform leg2_lower;
    public Transform leg3_lower;

    // These determine how the dog should be able to rotate around the y axis
    [Header("Body Rotation")] 
    public float maxTurnSpeed;
    public ForceMode turningForceMode;

    [Header("Sounds")]
    // If true, the dog will bark. 
    // Note : This should be turned off during training...unless you want to hear a dozen dogs barking for hours
    public bool canBark;
    // The clips to use for the barks of the dog
    public List<AudioClip> barkSounds = new List <AudioClip>();
	AudioSource audioSourceSFX;

    JointDriveController jdController;
    
    [HideInInspector]
    // This vector gives the position of the target relative to the position of the dog
	public Vector3 dirToTarget;
    // This float determines how much the dog will be rotating around the y axis
    float rotateBodyActionValue; 
    // Counts the number of steps until the next agent's decision will be made
    int decisionCounter;

    // [HideInInspector]
    public bool runningToItem;
    // [HideInInspector]
    public bool returningItem;

    void Awake()
    {
        // Audio Setup
        audioSourceSFX = body.gameObject.AddComponent<AudioSource>();
        audioSourceSFX.spatialBlend = .75f;
        audioSourceSFX.minDistance = .7f;
        audioSourceSFX.maxDistance = 5;
        if(canBark)
        {
            StartCoroutine(BarkBarkGame());
        }

        //Joint Drive Setup
        jdController = GetComponent<JointDriveController>();
        jdController.SetupBodyPart(body);
        jdController.SetupBodyPart(leg0_upper);
        jdController.SetupBodyPart(leg0_lower);
        jdController.SetupBodyPart(leg1_upper);
        jdController.SetupBodyPart(leg1_lower);
        jdController.SetupBodyPart(leg2_upper);
        jdController.SetupBodyPart(leg2_lower);
        jdController.SetupBodyPart(leg3_upper);
        jdController.SetupBodyPart(leg3_lower);

    }

    public override void OnEpisodeBegin()
    {
        foreach(var bodyPart in jdController.bodyPartsDict.Values)
        {
            bodyPart.Reset(bodyPart);
        }

        body.rotation = Quaternion.Euler(0, Random.Range(0f, 360f), 0f);
    }

    public override void Heuristic(in ActionBuffers actionsOut)
    {

    }

    /// <summary>
    /// Add relevant information on each body part to observations.
    /// </summary>
    public void CollectObservationBodyPart(VectorSensor sensor, BodyPart bp)
    {
        sensor.AddObservation(bp.groundContact.touchingGround ? 1 : 0); // Is this bp touching the ground
                                                                        // 
        if(bp.rb.transform != body)
        {
            sensor.AddObservation(bp.currentXNormalizedRot);
            sensor.AddObservation(bp.currentYNormalizedRot);
            sensor.AddObservation(bp.currentZNormalizedRot);
            sensor.AddObservation(bp.currentStrength/jdController.maxJointForceLimit);
        }
    }

    /// <summary>
    /// The method the agent uses to collect information about the environment
    /// </summary>
    public override void CollectObservations(VectorSensor sensor)
    {
        sensor.AddObservation(dirToTarget.normalized);
        sensor.AddObservation(body.localPosition);
        sensor.AddObservation(jdController.bodyPartsDict[body].rb.velocity);
        sensor.AddObservation(jdController.bodyPartsDict[body].rb.angularVelocity);
        sensor.AddObservation(body.forward); //the capsule is rotated so this is local forward
        sensor.AddObservation(body.up); //the capsule is rotated so this is local forward

        foreach (var bodyPart in jdController.bodyPartsDict.Values)
        {
            CollectObservationBodyPart(sensor, bodyPart);
        }
    }


    /// <summary>
    /// Rotates the body of the agent around the y axis
    /// </summary>
    /// <param name="act"> The amount by which the agent must rotate</param>
    void RotateBody(float act)
    {
        float speed = Mathf.Lerp(0, maxTurnSpeed, Mathf.Clamp(act, 0, 1));
        Vector3 rotDir = dirToTarget; 
        rotDir.y = 0;
        // Adds a force on the front of the body
        jdController.bodyPartsDict[body].rb.AddForceAtPosition(
            rotDir.normalized * speed * Time.deltaTime, body.forward, turningForceMode); 
        // Adds a force on the back od the body
        jdController.bodyPartsDict[body].rb.AddForceAtPosition(
            -rotDir.normalized * speed * Time.deltaTime, -body.forward, turningForceMode); 
    }

    /// <summary>
    /// Allows the dog to bark
    /// </summary>
    /// <returns></returns>
    public IEnumerator BarkBarkGame()
    {   

        while(true)
        {
            //When we're returning the stick we should not bark because we have
            //a stick in our mouth :|>
            if(!returningItem)
            {
                //Choose one of the barking clips at random and play it.
                audioSourceSFX.PlayOneShot(barkSounds[Random.Range( 0, barkSounds.Count)], 1);
            }
            //Wait for a random amount of time (between 1 & 10 sec) until we bark again.
            yield return new WaitForSeconds(Random.Range(1, 10)); 
        }
    }

    /// <summary>
    /// The agent's action method. Is called at each decision and allows the agent to move
    /// </summary>
    /// <param name="vectorAction"> The actions that were determined by the policy</param>
    /// <param name="textAction"> The text action given by the policy</param>
    public override void OnActionReceived(ActionBuffers actions)
    {
        var bpDict = jdController.bodyPartsDict;
        var vectorAction = actions.ContinuousActions;

        // Update joint drive target rotation
        bpDict[leg0_upper].SetJointTargetRotation(vectorAction[0], vectorAction[1], 0);
        bpDict[leg1_upper].SetJointTargetRotation(vectorAction[2], vectorAction[3], 0);
        bpDict[leg2_upper].SetJointTargetRotation(vectorAction[4], vectorAction[5], 0);
        bpDict[leg3_upper].SetJointTargetRotation(vectorAction[6], vectorAction[7], 0);
        bpDict[leg0_lower].SetJointTargetRotation(vectorAction[8], 0, 0);
        bpDict[leg1_lower].SetJointTargetRotation(vectorAction[9], 0, 0);
        bpDict[leg2_lower].SetJointTargetRotation(vectorAction[10], 0, 0);
        bpDict[leg3_lower].SetJointTargetRotation(vectorAction[11], 0, 0);

        // Update joint drive strength
        bpDict[leg0_upper].SetJointStrength(vectorAction[12]);
        bpDict[leg1_upper].SetJointStrength(vectorAction[13]);
        bpDict[leg2_upper].SetJointStrength(vectorAction[14]);
        bpDict[leg3_upper].SetJointStrength(vectorAction[15]);
        bpDict[leg0_lower].SetJointStrength(vectorAction[16]);
        bpDict[leg1_lower].SetJointStrength(vectorAction[17]);
        bpDict[leg2_lower].SetJointStrength(vectorAction[18]);
        bpDict[leg3_lower].SetJointStrength(vectorAction[19]);

        rotateBodyActionValue = vectorAction[20];
    }    

    /// <summary>
    /// Update the direction vector to the current target;
    /// </summary>
    public void UpdateDirToTarget()
    {
        dirToTarget = target.position - jdController.bodyPartsDict[body].rb.position;

    }

    void FixedUpdate()
    {
        UpdateDirToTarget();

        if (decisionCounter == 0)
        {
            decisionCounter = 3;
            RequestDecision();
        }
        else
        {
            decisionCounter--;
        }

        RotateBody(rotateBodyActionValue); 

        // Energy Conservation
        // The dog is penalized by how strongly it rotates towards the target.
        // Without this penalty the dog tries to rotate as fast as it can at all times.
        var bodyRotationPenalty = -0.001f * rotateBodyActionValue;
        AddReward(bodyRotationPenalty);

        // Reward for moving towards the target
        RewardFunctionMovingTowards();
        // Penalty for time
        RewardFunctionTimePenalty();
    }
	
    /// <summary>
    /// Reward moving towards target & Penalize moving away from target.
    /// This reward incentivizes the dog to run as fast as it can towards the target,
    /// and decentivizes running away from the target.
    /// </summary>
    void RewardFunctionMovingTowards()
    {

		float movingTowardsDot = Vector3.Dot(
		    jdController.bodyPartsDict[body].rb.velocity, dirToTarget.normalized); 
        AddReward(0.01f * movingTowardsDot);
    }


    /// <summary>
    /// Time penalty
    /// The dog gets a pentalty each step so that it tries to finish 
    /// as quickly as possible.
    /// </summary>
    void RewardFunctionTimePenalty()
    {
        AddReward(- 0.001f);  //-0.001f chosen by experimentation.
    }

}

컴파일 에러를 없애고

FetchTrainingScene 씬을 열어 줍니다.

아래와 같이 트레이닝 그룹 1개만 남기고
기존에 있던 그룹들은 다 지워줍니다.

트레이닝 그룹 아래에 보면

CORGI 가 있는데 이걸 선택해 주고

인스펙터로 가서 끊어진 데이터들을 새로 연결해 줍니다.

이제 실행을 해보면 코기가

움찔움찔 움직이는 걸 볼 수 있습니다.

(아직 학습이 안되서 거의 가만히 있음..)

이제 학습을 위해서 위의 트레이닝 그룹을 여러 개 복사해 줍니다.

이제 샘플이나 기존에 있던 프로젝트에서

yaml 파일을 복사해 와서 이름을 바꿔주고

학습 항목 설정을 적절히 맞춰 줍니다.

그다음 아나콘다를 실행해서 mlagents

실행 환경을 적용시켜 줍니다.

https://lhh3520.tistory.com/369?category=839963

[Mac] Mac에서 아나콘다 설치하는 방법

머신 러닝을 배우고 테스트하는데 파이썬과 텐서 플로가 최소한으로 필요한데 이러한 것들을 좀 더 쉽게 관리할 수 있도록 도와주는 프로그램입니다. 일단 아래에 아나콘다 홈페이지로 갑니다.

lhh3520.tistory.com

그리고 위에서 만들었던 yaml 파일을 가지고

mlagents-learn 명령어를 실행해 줍니다.

mlagents-learn config/Corgi.yaml --run-id=Corgi01

실행하면 아래와 같이 유니티 로고가 그려지고

학습을 위한 연결이 완료 됩니다.

이제 이 상태로 에디터로 돌아가서

플레이를 시켜 학습을 진행해 줍니다.

꼬물꼬물 하면서 뭔가 학습하는 모습.

학습이 완료되고 나면 프로젝트 폴더 안 results 폴더에

.onnx 파일이 생성됩니다. (이전엔 .nn 이었는데 바뀌었네요)

학습이 어떻게 이루어졌는지 그래프로 볼 수도 있는데

콘솔에서 아래와 같이 실행하고

tensorboard --logdir results --port 6006

웹 브라우저에서

http://localhost:6006/

로 접속하면 텐서 보드를 보여줍니다.

학습이 다 완료되었으니

이제 다시 유니티로 돌아와서 아까 생성된 .onnx 파일을

프로젝트 폴더 안으로 옮겨 줍니다.

그다음 DogAgent에게

학습된 모델을 적용시켜 줍니다.

그리고 이제 실행을 시켜 보면

약간 어리숙(?) 하긴 하지만 잘 가져오는 걸 볼 수 있습니다.

(학습량이나 시간을 늘려주면 훨씬 좋아지겠네요)

'Programming > 머신 러닝' 카테고리의 다른 글

[머신러닝] Keras를 이용한 CNN으로 이미지 분류 (3)	2019.06.30
[머신러닝] Keras를 이용한 간단한 이미지 분류 (0)	2019.06.29
[Mac] 아나콘다 환경에서 텐서플로 설치하기 (0)	2019.06.07
[Mac] Mac에서 아나콘다 설치하는 방법 (1)	2019.06.07

lhh3520's 공간박스

[MLAgents][Unity] 퍼포 더 코기 최신화

'Programming > 머신 러닝' 카테고리의 다른 글

티스토리툴바