Howto BF-STREAMS-001: Accessing Native Data From MLPro

Ver. 1.0.2 (2023-02-02)

This module demonstrates the use of native generic data streams provided by MLPro. To this regard, all data streams of the related provider class will be determined and iterated.

You will learn:

  1. How to access MLPro’s native data streams.

  2. How to iterate the instances of a native stream.

  3. How to access feature data of a native stream.

Executable code

## -------------------------------------------------------------------------------------------------
## -- Project : MLPro - A Synoptic Framework for Standardized Machine Learning Tasks
## -- Package : mlpro.bf.examples
## -- Module  : howto_bf_streams_001_accessing_native_data_from_mlpro.py
## -------------------------------------------------------------------------------------------------
## -- History :
## -- yyyy-mm-dd  Ver.      Auth.    Description
## -- 2022-11-08  1.0.0     DA       Creation
## -- 2022-12-14  1.0.1     DA       Corrections
## -- 2023-02-02  1.0.2     DA       Correction of time measurement
## -------------------------------------------------------------------------------------------------

"""
Ver. 1.0.2 (2023-02-02)

This module demonstrates the use of native generic data streams provided by MLPro. To this regard,
all data streams of the related provider class will be determined and iterated. 

You will learn:

1) How to access MLPro's native data streams.

2) How to iterate the instances of a native stream.

3) How to access feature data of a native stream.

"""


from datetime import datetime
from mlpro.bf.streams.streams import *
from mlpro.bf.various import Log



# 0 Prepare Demo/Unit test mode
if __name__ == '__main__':
    logging     = Log.C_LOG_ALL
else:
    logging     = Log.C_LOG_NOTHING


# 1 Create a Wrapper for OpenML stream provider
mlpro = StreamProviderMLPro(p_logging=logging)


# 2 Determine native data streams provided by MLPro
for stream in mlpro.get_stream_list( p_logging=logging ):
    stream.switch_logging( p_logging=logging )
    try:
        labels = stream.get_label_space().get_num_dim()
    except:
        labels = 0

    stream.log(Log.C_LOG_TYPE_W, 'Features:', stream.get_feature_space().get_num_dim(), ', Labels:', labels, ', Instances:', stream.get_num_instances() )

if __name__ == '__main__':
    input('\nPress ENTER to iterate all streams dark...\n')


# 3 Performance test: iterate all data streams dark and measure the time
for stream in mlpro.get_stream_list( p_logging=logging ):
    stream.switch_logging( p_logging=logging )
    stream.log(Log.C_LOG_TYPE_W, 'Number of instances:', stream.get_num_instances() )
    stream.switch_logging( p_logging=Log.C_LOG_NOTHING )

    # 3.1 Iterate all instances of the stream
    tp_start = datetime.now()
    myiterator = iter(stream)
    for i, curr_instance in enumerate(myiterator):
        curr_data = curr_instance.get_feature_data().get_values()

    tp_end       = datetime.now()
    duration     = tp_end - tp_start
    duration_sec = ( duration.seconds * 1000000 + duration.microseconds + 1 ) / 1000000
    rate         = myiterator.get_num_instances() / duration_sec

    myiterator.switch_logging( p_logging=logging )
    myiterator.log(Log.C_LOG_TYPE_W, 'Done in', round(duration_sec,3), ' seconds (throughput =', round(rate), 'instances/sec)')    

Results

2023-02-11  22:40:18.898725  I  Stream Provider "MLPro": Instantiated
2023-02-11  22:40:18.898725  I  Stream "Random 10D x 1000": Instantiated
2023-02-11  22:40:18.898725  I  Stream "Double Spiral 2D x 721": Instantiated
2023-02-11  22:40:18.898725  I  Stream "Static Clouds 2D": Instantiated
2023-02-11  22:40:18.898725  I  Stream "Static Clouds 3D": Instantiated
2023-02-11  22:40:18.898725  I  Stream Provider "MLPro": Getting list of streams...
2023-02-11  22:40:18.898725  I  Stream Provider "MLPro": Number of streams found: 4
2023-02-11  22:40:18.898725  W  Stream "Random 10D x 1000": Features: 10 , Labels: 2 , Instances: 1000
2023-02-11  22:40:18.898725  W  Stream "Double Spiral 2D x 721": Features: 2 , Labels: 0 , Instances: 721
2023-02-11  22:40:18.898725  W  Stream "Static Clouds 2D": Features: 2 , Labels: 0 , Instances: 1000
2023-02-11  22:40:18.898725  W  Stream "Static Clouds 3D": Features: 3 , Labels: 0 , Instances: 2000

Press ENTER to iterate all streams dark...

2023-02-11  22:53:21.191034  I  Stream Provider "MLPro": Getting list of streams...
2023-02-11  22:53:21.191034  I  Stream Provider "MLPro": Number of streams found: 4
2023-02-11  22:53:21.191034  W  Stream "Random 10D x 1000": Number of instances: 1000
2023-02-11  22:53:21.222301  W  Stream "Random 10D x 1000": Done in 0.031  seconds (throughput = 31982 instances/sec)
2023-02-11  22:53:21.222301  W  Stream "Double Spiral 2D x 721": Number of instances: 721
2023-02-11  22:53:21.222301  W  Stream "Double Spiral 2D x 721": Done in 0.0  seconds (throughput = 721000000 instances/sec)
2023-02-11  22:53:21.222301  W  Stream "Static Clouds 2D": Number of instances: 1000
2023-02-11  22:53:21.237922  W  Stream "Static Clouds 2D": Done in 0.016  seconds (throughput = 64012 instances/sec)
2023-02-11  22:53:21.237922  W  Stream "Static Clouds 3D": Number of instances: 2000
2023-02-11  22:53:21.275686  W  Stream "Static Clouds 3D": Done in 0.038  seconds (throughput = 52959 instances/sec)

Cross Reference