Tuesday, January 28, 2020

SANS Holiday Hack 2019 - Objective 8: Bypassing the Friedo Sleigh CAPTEHA

Objective 8:  Bypassing the Friedo Sleigh CAPTEHA




To complete this objective, you need to go to Krampus layer.  If you completed Objective 7, you can use the Steam Tunnels in your badge to go directly to Krampus’ layer.  Fairly certain you have to complete Krampus’ Challenge to do this, unless you’re a web dev wizard.  But, in case you didn’t, go to the dorm, through the controlled access door, all the way to the right until you reach an open door.  Go into the open door, go into the closet, open the lock, go through the tunnels until you reach Krampus.  He will tell you the objective.  Alabaster Snowball gives the hints for this challenge.  The following link is Chris Davis talk that Alabaster mentions.  https://www.youtube.com/watch?v=jmVPLwjm_zs&feature=youtu.be



Krampus gives you the details of the challenge as well as some images and a script he wrote to help send requests to the CAPTEHA via the api.  If you know python, it's helpful because the machine learning demo that Mr. Davis has in the git repository and capteha_api.py that Krampus gives is in python.  It’s not absolutely necessary to know python though.  I don’t.  Thankfully I took programming courses in different languages, so I can read some of it.  I Googled the rest.  I'm using the Slingshot image.  Linux seems to have less overhead than Windows, and usually a smaller processing footprint.  This is important for processing all the images and sending the request.  Note that this challenge does not have to be solved in the terminal.  Go here:  https://fridosleigh.com/.  This is what you’re supposed to bypass.  It's helpful to see how it works.

First thing you should do is watch the talk.  That is the single most helpful thing to help solve this challenge.  Second thing is to download the GitHub repository that he mentions in his talk.  github.com/chrisjd20/img_rec_tf_ml_demo.  Either clone or download it just like the directions state. 
git clone https://github.com/chrisjd20/img_rec_tf_ml_demo.git

cd img_rec_tf_ml_demo 
sudo apt install python3 
python3-pip -y 
sudo python3 -m pip install --upgrade pip  
sudo python3 -m pip install --upgrade setup tools 
sudo python3 -m pip install --upgrade tensorflow==1.15 
sudo python3 -m pip install tensorflow_hub




 


There are multiple ways of solving a challenge.  The following is one way.  There may be better, more efficient, ways.

Download the images https://downloads.elfu.org/capteha_images.tar.gz and Save them in a directory called, “training_images”.  There should be separate folders that are named the items that you want to train the machine to recognize.  The folders should contain images corresponding to the folder names, so, Candy Canes in the Candy Canes folder, Christmas Trees in the Christmas Tree folder, etc.

Next, run

python3 retrain.py --image_dir ./training_images/

Keep in mind, that that . in front of ./training_images means the current directory.  Depending on where you saved it, you might have to run it using the directory path like so:  /home/slingshot/Desktop/training_images/

That python program will create two temporary files in the /tmp directory that are necessary to process the images that are sent from the CAPTECHA so that your machine can recognize what each image is and send back the correct response.    Depending on how your machine is configured, if you reboot, you may have to retrain the machine, because many times the /tmp files are deleted at log off.  If you don't want the training images to be deleted at log off, reconfigure the script to save those files somewhere other than the /tmp directory.

Also, if you get an error when trying to run the python file, make sure that your current user has permission to execute it.

Next, modify the CAPTEHA API file that Krampus gives you.  The following commented python code is what I did.  Basically all I did was modify the API that Krampus gave us.  He made it fairly clear what was needed, by putting the following in his code:

    '''
    MISSING IMAGE PROCESSING AND ML IMAGE PREDICTION CODE GOES HERE
    '''


I borrowed code from the machine learning processing code that Chris Davis gave us in the github repository, and placed the code where Krampus outlined to in the api script.  

The files that the CAPTECHA downloads and displays are downloaded into an “unknown_images” directory.  (Chris already provided the code, might as well use it.)  So, create that directory and keep track of that path.  The path is needed for the script.  Each time the script is ran in this version of the api python script, the unknown_images directory has to be deleted and recreated so that it doesn't guess the photos from previous attempts.  This could be added to the script, but I created a one liner that runs the commands one after the other like so, and did the command between attempts:  rm -rf / tmp/unknown_images && mkdir /tmp/unknown_images && ./captecha_api.py.


#!/usr/bin/env python3
# Fridosleigh.com CAPTEHA API - Made by Krampus Hollyfeld
import requests
import json
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)
import numpy as np
import threading
import queue
import time
import sys
import base64


def load_labels(label_file):
    label = []
    proto_as_ascii_lines = tf.gfile.GFile(label_file).readlines()
    for l in proto_as_ascii_lines:
        label.append(l.rstrip())
    return label

def predict_image(q, sess, graph, image_bytes, img_full_path, uuid, labels, unknown_images_dir, input_operation, output_operation):
    image = read_tensor_from_image_bytes(image_bytes)
    results = sess.run(output_operation.outputs[0], {
        input_operation.outputs[0]: image
    })
    results = np.squeeze(results)
    prediction = results.argsort()[-5:][::-1][0]
    #You might want to make it clear what your variables are referring to.  I was just trying stuff and came across 
    an answer by pure luck.
    bob = img_full_path.split(unknown_images_dir)
    bobo = bob[1].split('/')
    bobbo = bobo[1]
    q.put({'prediction':labels[prediction].title(), 'uuid':bobbo})

def load_graph(model_file):
    graph = tf.Graph()
    graph_def = tf.GraphDef()
    with open(model_file, "rb") as f:
        graph_def.ParseFromString(f.read())
    with graph.as_default():
        tf.import_graph_def(graph_def)
    return graph
 
def main():
    yourREALemailAddress = “myrealemailaddress@washere.com"

    # Creating a session to handle cookies
    s = requests.Session()
    url = "https://fridosleigh.com/"

    json_resp = json.loads(s.get("{}api/capteha/request".format(url)).text)
    b64_images = json_resp[‘images'] # A list of dictionaries each containing the keys 'base64' and 'uuid'
    challenge_image_type = json_resp[‘select_type'].split(',') # The Image types the CAPTEHA Challenge is looking for.
    challenge_image_types = [challenge_image_type[0].strip(), challenge_image_type[1].strip(), challenge_image_type[2].replace(' and ','').strip()] # cleaning and formatting
    print(challenge_image_types)

    for result in b64_images:
        uuid = result['uuid']
        with open(uuid, "wb") as fh:
            fh.write(base64.b64decode(result['base64']))  
    '''
    MISSING IMAGE PROCESSING AND ML IMAGE PREDICTION CODE GOES HERE
    '''


    # Loading the Trained Machine Learning Model created from running retrain.py on the training_images 

    directory
    graph = load_graph('/tmp/retrain_tmp/output_graph.pb')
    labels = load_labels("/tmp/retrain_tmp/output_labels.txt")

    # Load up our session
    input_operation = graph.get_operation_by_name("import/Placeholder")
    output_operation = graph.get_operation_by_name("import/final_result")
    sess = tf.compat.v1.Session(graph=graph)

    # Can use queues and threading to speed up the processing
    q = queue.Queue()

    unknown_images_dir = '/home/slingshot/Desktop/unknown_images'
    print(unknown_images_dir)
    unknown_images = os.listdir(unknown_images_dir)
    print(unknown_images)

 #Going to interate over each of our images.
    for image in unknown_images:
        img_full_path = '{}/{}'.format(unknown_images_dir, image)
        print('Processing Image {}'.format(img_full_path))
        # We don't want to process too many images at once. 10 threads max
        while len(threading.enumerate()) > 10:
            time.sleep(0.00001)        
    #predict_image function is expecting png image bytes so we read image as 'rb' to get a bytes object
        image_bytes = open(img_full_path,'rb').read()
        threading.Thread(target=predict_image, args=(q, sess, graph, image_bytes, img_full_path, uuid, labels, 
        unknown_images_dir, input_operation, output_operation)).start()
    
    print('Waiting For Threads to Finish...')
    while q.qsize() < len(unknown_images):
        time.sleep(0.0001)
    
    #getting a list of all threads returned results
    prediction_results = [q.get() for x in range(q.qsize())]
    final_answer_list= []
    #do something with our results... Like print them to the screen.
    for prediction in prediction_results:
        if(prediction['prediction'] in challenge_image_types):
            print(prediction['prediction'])
            print(prediction['uuid'])
            final_answer_list.append(prediction['uuid'])
    final_answer = ','.join(final_answer_list)
    # This should be JUST a csv list image uuids ML predicted to match the challenge_image_type .
    #final_answer = ','.join(for prediction['uuid'] in final_answer_list )
    
    json_resp = json.loads(s.post("{}api/capteha/submit".format(url), data={'answer':final_answer}).text)
#
    if not json_resp['request']:
        # If it fails just run again. ML might get one wrong occasionally
        print('FAILED MACHINE LEARNING GUESS')
        print('--------------------\nOur ML Guess:\n--------------------\n{}'.format(final_answer))
        print('--------------------\nServer Response:\n--------------------\n{}'.format(json_resp['data']))
        sys.exit(1)
    print('CAPTEHA Solved!')
    # If we get to here, we are successful and can submit a bunch of entries till we win
    userinfo = {
        'name':'Krampus Hollyfeld',
        'email':yourREALemailAddress,
        'age':180,
        'about':"Cause they're so flippin yummy!",
        'favorites':'thickmints'
    }
    # If we win the once-per minute drawing, it will tell us we were emailed.
    # Should be no more than 200 times before we win. If more, somethings wrong.
    entry_response = ''
    entry_count = 1
    while yourREALemailAddress not in entry_response and entry_count < 200:
        print('Submitting lots of entries until we win the contest! Entry #{}'.format(entry_count))
        entry_response = s.post("{}api/entry".format(url), data=userinfo).text
        entry_count += 1
    print(entry_response) 
if __name__ == "__main__":
    main()

The key to this challenge was persistence.  If you don’t know python and/or machine learning, it can be frustrating. 

But you keep trying until you get it.  Notice from the image below that after I got the syntax right, it took 101 submissions before it went through.  This is not counting the many times I had to modify the code because of a syntax error or logic error.  Don't give up.


No comments:

Post a Comment