Using the DataStar Python Library

DataStar users can take advantage of the datastar Python library, which gives users access to DataStar projects, macros, tasks, and connections through Python scripts. This way users can build, access, and run their DataStar workflows programmatically. The library can be used in a user’s own Python environment (local or on the Optilogic platform), and it can also be used in Run Python tasks in a DataStar macro.

In this documentation we will cover how to use the library through multiple examples. At the end, we will step through an end-to-end script that creates a new project, adds a macro to the project, and creates multiple tasks that are added to the macro. The script then runs the macro while giving regular updates on its progress.

Before diving into the details of this article, it is recommended to read this “Setup: Python Scripts for Cosmic Frog and DataStar” article first; it explains what users need to do in terms of setup before they can run Python scripts using the datastar library. To learn more about the DataStar application itself, please see these articles on Optilogic’s Help Center.

Succinct documentation in PDF format of all datastar library functionality can be downloaded here (please note that the long character string at the beginning of the filename is expected). This includes a list of all available properties and methods for the Project, Macro, Task, and Connection classes at the end of the document.

All Python code that is shown in the screenshots throughout this documentation is available in the Appendix, so that you can copy-paste from there if you want to run the exact same code in your own Python environment and/or use these as jumping off points for your own scripts.

Getting Information about Projects and Macros

If you have reviewed the “Setup: Python Scripts for Cosmic Frog and DataStar” article and are set up with your local or online Python environment, we are ready to dive in! First, we will see how we can interrogate existing projects and macros using Python and the datastar library. We want to find out which DataStar projects are already present in the user’s Optilogic account.

  1. If we want to use the datastar python library, we have to either import it entirely (“import datastar”) or import the classes that we want to use from it. In our case, we are importing all classes via “from datastar import *”. This way each class is imported separately and can be referenced individually, e.g. Project for the project class, Macro for the macro class, etc.
  2. To find out which DataStar projects are already existing in our user account, we create a variable named project_list and set it by using a method from the Project class. After typing “Project.”, a list of available methods for the Project class comes up.
  3. We choose the “get_projects” method:

Once the parentheses are typed, hover text comes up with information about this function. It tells us that the outcome of this method will be a list of strings, and the description of the method reads “Retrieve all project names visible to the authenticated user”. Most methods will have similar hover text describing the method, the arguments it takes and their default values, and the output format.

Now that we have a variable that contains the list of DataStar projects in the user account, we want to view the value of this variable:

  1. To view the list of projects, we add a simple print statement to print the value of the project_list variable.
  2. We can run this small script by clicking on the play button at the top right of the script.
  3. The output of the print statement will appear in the terminal at the bottom of the code editor, and we see that this user has 5 DataStar projects in their user account.

Next, we want to dig one level deeper and for the “Import Historical Shipments” project find out what macros it contains:

  1. We use the connect_to method of the Project class to set our “project” variable to the “Import Historical Shipments” project. Now, whenever the Python code does something with “project” it applies to this “Import Historical Shipments” project.
  2. We create a variable “macro_list” which will contain the list of macros in this project by using the get_macros method of the Project class.
  3. Again, we will write the resulting list to the terminal using a print statement.  
  4. We have run the script and see that this project contains 1 macro named “Import Shipments”.

Finally, we will retrieve the tasks this “Import Shipments” macro contains in a similar fashion:

  1. We set the variable “macro” to the “Import Shipments” macro by using the get_macro method in the Project class. Now whenever we reference “macro”, it applies to the “Import Shipments” macro.
  2. The variable task_list will contain the tasks the macro contains by using the get_tasks method available in the Macro class.
  3. We will print the task_list to the terminal.
  4. We have run the script and see that this macro contains 4 tasks: “Start” (each macro contains a Start task as the first task), “Import Raw Shipments”, “Create customers from shipments”, and “Export Customers”.

In addition, we can have a quick look in the DataStar application to see that the information we are getting from the small scripts above matches what we have in our account in terms of projects (first screenshot below), and the “Import Shipments” macro plus its tasks in the “Import Historical Shipments” project (second screenshot below):

A screenshot of a computerAI-generated content may be incorrect.
A screenshot of a computerAI-generated content may be incorrect.

Besides getting information about projects and macros, other useful methods for projects and macros include:

  • Project.create to create a new project.
  • The project.name and macro.name properties can be set to change the name of a project or macro (need to save to commit the change).
  • The project.description and macro.description properties can be set to change the description of a project or macro (need to save to commit the change).
  • project.save and macro.save to save a changed project / macro name or description.
  • project.delete and macro.delete to remove a project or macro.

Note that when creating new objects (projects, macros, tasks or connections) these are automatically saved. If existing objects are modified, their changes need to be committed by using the save method.

Copying Macros and Tasks

Macros can be copied, either within the same project or into a different project. Tasks can also be copied, either within the same macro, between macros in the same project, or between macros of different projects. If a task is copied within the same macro, its name will automatically be suffixed with (Copy).

As an example, we will consider a macro called “Cost Data” in a project named “Data Cleansing and Aggregation NA Model”, which is configured as follows:

A diagram of a companyAI-generated content may be incorrect.

The North America team shows this macro to their EMEA counterparts who realize that they could use part of this for their purposes, as their transportation cost data has the same format as that of the NA team. Instead of manually creating a new macro with new tasks that duplicate the 3 transportation cost related ones, they decide to use a script where first the whole macro is copied to a new project, and then the 4 tasks which are not relevant for the EMEA team are deleted:

  1. Connect to the existing NA project and get the macro we want to copy (“Cost Data”).
  2. Create a new EMEA-based project and use the clone method of the Macro class to copy the whole macro and all its dependencies into the new project. By using the name argument, we rename the macro to “Cost Data EMEA”. An optional description has been added too.
  3. Here, the tasks that we want to keep are listed: the Start task and the 3 transportation cost related ones. The get_tasks method is used to get a list of all the tasks in the new macro (the “Cost Data EMEA” one).
  4. This logic uses the delete method of the Task class to delete tasks that are currently in the Cost Data EMEA macro that we do not want to keep.

After running the script, we see in DataStar that there is indeed a new project named “Data Cleansing and Aggregation EMEA” which has a “Cost Data EMEA” macro that contains the 3 transportation cost related tasks that we wanted to keep:

A screenshot of a computerAI-generated content may be incorrect.

Note that another way we could have achieved this would have been to copy the 3 tasks from the macro in the NA project to the new macro in the EMEA project. The next example shows this for one task. Say that after the Cost Data EMEA macro was created, the team finds they also have a use for the “Import General Ledger” task that was deleted as it was not on the list of “tasks to keep”. In an extension of the previous script or a new one, we can leverage the add_task method of the Macro class to copy the “Import General Ledger” task from the NA project to the EMEA one:

  1. We set the project and macro variables to the NA project and its “Cost Data” macro, and to the EMEA project and its “Cost Data EMEA” macro.
  2. The get_task method is used to set task_to_copy to the “Import General Ledger” task (in the “Cost Data” macro of the NA project).
  3. Since we want to connect the “Import General Ledger” task to the Start task in the “Cost Data EMEA” macro it is being copied to, we need to get this task; it is set as the start_task variable.
  4. Now, we use the add_task method to add the task_to_copy (“Import General Ledger” from “Cost Data” macro in the NA project) to macro_2, which is the “Cost Data EMEA” macro in the EMEA project. We use the optional auto_join and previous_task arguments to ensure the task is connected to the Start task. When setting auto_join to True or not specifying it (default of True will be used in that case), the task will be chained to the last task in the macro. When auto_join = False, the previous_task argument specifies which task the added task will be chained to and if previous_task is not set, the task will not be chained to another task.

After running the script, we see that the “Import General Ledger” task is now part of the “Cost Data EMEA” macro and is connected to the Start task:

Several additional helpful features on chaining tasks together in a macro are:

  • Use task.add_dependency(previous_task_name) and task.remove_dependency(previous_task_name) to connect or disconnect from a preceding task.
  • Use the get_dependencies method to get a list of a task’s preceding tasks. E.g.:
    • deps = export_task.get_dependencies()
    • print(deps)
  • Use the add_tasks method to add multiple tasks at once, either chained in order or all connected to and fanning out from 1 preceding task:
    • macro.add_tasks([task1, task2, task3]) – this will chain the tasks in the order specified
    • macro.add_tasks([taskX, taskY, taskZ], previous_task = some_task) – this will connect all 3 tasks that are added to 1 preceding task some_task

Creating Connections

DataStar connections allow users to connect to different types of data sources, including CSV-files, Excel files, Cosmic Frog models, and Postgres databases. These data sources need to be present on the Optilogic platform (i.e. visible in the Explorer application). They can then be used as sources / destinations / targets for tasks within DataStar.  

We can use scripts to create data connections:

  1. To create a connection to a CSV-file, use the DelimitedConnection class:
    1. Use the name argument to give the connection a name, otherwise connections will by default be named Connection 1, Connection 2, etc.
    2. The path to a CSV-file on the Optilogic platform will need to be specified.
    3. For CSV-files, the delimiter needs to be set to comma.
    4. Optionally, a description and encoding can be specified too.
  1. The FrogModelConnection class can be used to create a connection to a Cosmic Frog model:
    1. Use the name argument to give the connection a name and prevent it being given a default name.
    2. The model_name argument needs to be set to the name of an existing Cosmic Frog model in the user’s account. A path does not need to be specified, just the name suffices.
    3. Optionally, a description can be added.

After running this script, we see the connections have been created. In the following screenshot, the Explorer is on the left, and it shows the Cosmic Frog model “Global Supply Chain Strategy.frog” and the Shipments.csv file. The connections using these are listed in the Data Connections tab of DataStar. Since we did not specify any description, an auto-generated description “Created by the Optilogic Datastar library” was added to each of these 2 connections:

In addition to the connections shown above, data connections to Excel files (.xls and .xlsx) and PostgreSQL databases which are stored on the Optilogic platform can be created too. Use the ExcelConnection and OptiConnection classes to set up such these types of connections up.

Accessing the Project Sandbox Directly

Each DataStar project has its own internal data connection, the project sandbox. This is where users perform most of the data transformations after importing data into the sandbox. Using scripts, we can access and modify data in this sandbox directly instead of using tasks in macros to do so. Note that if you have a repeatable data workflow in DataStar which is used periodically to refresh a Cosmic Frog model where you update your data sources and re-run your macros, you need to be mindful of making one-off changes to the project sandbox through a script. When you change data in the sandbox through a script, macros and tasks are not updated to reflect these modifications. When running the data workflow the next time, the results may be different if that change the script made is not made again. If you want to include such changes in your macro, you can add a Run Python task to your macro within DataStar.

Our “Import Historical Shipments” project has a table named customers in its project sandbox:

A screenshot of a computerAI-generated content may be incorrect.
  1. We have opened the DataStar application.
  2. The bread crumb trail at the top of the Optilogic platform tells us we are in the “Import Historical Shipments” DataStar project.
  3. In DataStar, click on the Data Connections tab.
  4. Expand the Project Sandbox connection to see its contents.
  5. Click on the customers table to open it in the central part of DataStar.
  6. We notice that due to the format of the customer names these will not sort in numerical order of their customer number.  

To make the customers sort in numerical order of their customer number, our goal in the next script is to update the number part of the customer names with left padded 0’s so all numbers consist of 4 digits. And while we are at it, we are also going to replace the “CZ” prefix with a “Cust_” prefix.

First, we will show how to access data in the project sandbox:

  1. After connecting to the “Import Historical Shipments” project, we use the get_sandbox method of the Project class to set the sandbox variable to the project sandbox of this project.
  2. We can use the read_table method to create a pandas dataframe (df) that contains the contents of the customers table.
  3. Use a print statement to write the contents of the df_customers variable to the terminal.
  4. The result of running this script is the list of customers shown in the terminal. This matches what we saw in DataStar when opening the customers table (previous screenshot).

Next, we will use functionality of the pandas Python library (installed as a dependency when installing the datastar library) to transform the customer names to our desired Cust_xxxx format:

  1. We create a new dataframe, df_new_customers, for manipulation of the customername column values. Initially we set it to the df_customers dataframe as our starting point.
  2. These 3 lines manipulate the values in the customername column:
    1. This step removes the “CZ” from the start of the customer names.
    2. The code on line 9 adds 0’s to the left of the customer number so that it will be a 4-digit number (left-padding up to 4 characters with zeros).
    3. Lastly, we add “Cust_” as the prefix.
  1. Again, we will print the contents of the dataframe to the terminal.
  2. We have run the script and see that our desired changes to the values in the customername column have been made.

As a last step, we can now write the updated customer names back into the customers table in the sandbox. Or, if we want to preserve the data in the sandbox, we can also write to a new table as is done in the next screenshot:

We use the write_table method to write the dataframe with the updated customer names into a new table called “new_customers” in the project sandbox. After running the script, opening this new table in DataStar shows us that the updates worked:

End-to-end Script

Finally, we will put everything we have covered above together in one script which will:

  • Create a new project and add a new macro to it.
  • Set up 1 data connection to a Shipments.csv file.
  • Access 1 existing data connection to an empty Cosmic Frog model.
  • Access the Project Sandbox built-in connection.
  • Create an Import task to import the historical shipments from the shipments.csv file into a table in the project sandbox.
  • Create 3 Run SQL tasks which will create unique customers, unique distribution centers, and aggregated customer demand from the historical shipment data.
  • Create 3 Export tasks, which will export the customers, distribution centers, and customer demand into the corresponding tables in the Cosmic Frog model.
  • Run the Macro and give regular updates on its progress while running.

We will look at this script through the next set of screenshots. For those who would like to run this script themselves, and possibly use it as a starting point to modify into their own script:  

  1. The script code can be copied from the appendix further below.
  2. On the Optilogic platform you need to create a new empty Cosmic Frog model and set up a data connection named “Cosmic Frog Model” to it from within DataStar.  
    1. To create an empty Cosmic Frog model in the Explorer application on the Optilogic platform: right-click on the folder you want to create the model in, choose “Create Cosmic Frog model”, choose the second option “Anura New Model”, type your desired name and hit Enter.
    2. To set up the data connection: open the DataStar application, click on the Create Data Connection button, use “Cosmic Frog Model” as the name for the connection, optionally write a description, set the Connection Type to Cosmic Frog Models, select the model created under the previous bullet in the list of models, and click on Add Connection.
  1. The zipped Shipments.csv file can be downloaded here (please note that the long character string at the beginning of the zip's file name is expected). To match the script, you need to upload it to your My Files/DataStar folder on the Optilogic platform. The file contains about 42.6k shipment records and has the following structure:
A screenshot of a computerAI-generated content may be incorrect.
  1. Import all classes from the datastar library.
  2. The top part of the script will create a new project and add a macro to it:
    1. The new project is created with the name set as “Scripting with DataStar”. A description of what the project aims to do has been added too.
    2. A macro named “Populate 3 CF Model Tables” is added to the project.
  1. The data connections are created / connected to in the second part of the script:
    1. Using the get_sandbox method, we can now interact with the sandbox by using the variable named “sandbox”.
    2. Using the Connection.get_connection method, we have now set the “cf_model” variable to the Cosmic Frog data connection we created prior.
    3. A CSV-file connection to the shipments.csv file located in the /My Files/DataStar folder is created. If you uploaded the file to another location on the Optilogic platform, you need to update the path (line 23). The name of the connection which will be used by DataStar is “May2024-Sept2025 Shipments”; the script uses the “shipments” variable to interact with the connection.

Next, we will create 7 tasks to add to the “Populate 3 CF Model Tables” macro, starting with an Import task:

A computer screen with text and imagesAI-generated content may be incorrect.
  1. The first task is an Import task created by the “import_shipments_task” variable. This task will import the shipment data from the CSV-file connection into a new table “raw_shipments” in the project sandbox:
    1. The name of the task is “Import historical shipments” as set by the name argument.
    2. The data source from which the data is pulled is specified by the source_connection argument; this is the shipments CSV-file connection in this case.
    3. We want to import the data from the source_connection into the project sandbox, which is specified by the destination_connection argument.
    4. The destination_table argument specifies which table in the destination_connection the data will be imported into. Here this is a table named “raw_shipments” which will be overwritten if it already exists or created if it does not yet exist.
  1. The second task is a Run SQL task set by the “create_dc_task” variable. This task will create unique distribution centers (DCs) from the values in the Origin DC column in the raw_shipments table:
    1. The name of the task is “Create DCs”.
    2. The target connection is the project sandbox.
    3. The query argument contains the SQL query to be run when the task runs as a string. This query will create a new table named “distribution_centers” in the sandbox which will have a column named “dc_name” that contains the distinct values from the origin_dc column in the raw_shipments table. Averages of the origin_latitude and origin_latitude columns are used to calculate the values for the dc_latitude and dc_longitude columns in the new table.

Similar to the “create_dc_task” Run SQL task, 2 more Run SQL tasks are created to create unique customers and aggregated customer demand from the raw_shipments table:

A screenshot of a computer programAI-generated content may be incorrect.
  1. These 2 Run SQL tasks are created in a very similar way as the first one, except that their names and SQL queries are different:
    1. The “Create Customers” SQL task creates a new table called “customers” in the project sandbox and uses very similar logic as the “Create DCs” task to create unique customers, just using the destination_store, destination_latitude, and destination_longitude columns from the raw_shipments table to create the cust_name, cust_latitude, and cust_longitude columns in the new table.
    2. The SQL query for the “Create Customer Demand” Run SQL task aggregates the raw_shipments to get the total demand quantity by customer and product, while filtering the shipments so that only those from July 1st, 2024, through June 30th, 2025, are included.
  1. These 2 tasks both use the auto_join=False and previous_task arguments to ensure they are chained to the import_shipments_task rather than to the last created task.

Now that we have generated the distribution_centers, customers, and customer_demand tables in the project sandbox using the 3 SQL Run tasks, we want to export these tables into their corresponding Cosmic Frog tables (facilities, customers, and customerdemand) in the empty Cosmic Frog model:

A computer screen with many colorful textAI-generated content may be incorrect.
  1. The first Export task to be added is assigned to the export_dc_task variable:
  2. Its name is set as “Export Distribution Centers”.
  3. The sandbox is the source_connection, the data is exported from here into the destination.
  4. The table in the sandbox to be exported is set by the source_table argument, “distribution_centers” here.
  5. The connection to receive the exported data is set by the destination_connection, the cf_model connection here.
  6. The “facilities” table in the cf_model connection is set as the destination_table, so the data from the “distribution_centers” table in the sandbox will be exported into the “facilities” table in the cf_model connection.
  7. The destination_table_type indicates whether the table that the data is exported to is an existing or new table. If set to new, it will be created when the task is run. In our example here, the “facilities” table is an already existing table, so the argument is set to “existing”.
  8. In case the destination_table_type (previous bullet) is set to existing, the destination_table_action argument sets whether any pre-existing data in the destination_table should be overwritten (value = “replace”) or if the new data should be appended to the existing data (value = “append”).
  9. If column names and types between the source_table and destination_table do not match exactly, the mappings argument needs to be used to indicate which source_column maps to which destination_column and what the types of both are. Here, the dc_name source column, a text column, is mapped to the facilityname destination column, also a text column, etc. The format is a list of dictionaries with 4 key:value pairs in a dictionary for each mapping. The 4 keys are sourceType, targetType, sourceColumn, and targetColumn.
  10. Since we do not want this task to be chained to the last created one (“Create customer demand”, see above) we use the auto_join argument and set it to False.
  11. This task should be chained to the Run SQL task that creates the distribution_centers table, which is the create_dc_task, so we set the previous_task argument to this task.

The following 2 Export tasks are created in a very similar way:

This completes the build of the macro and its tasks.

If we run it like this, the tasks will be chained in the correct way, but they will be displayed on top of each other on the Macro Canvas in DataStar. To arrange them nicely and prevent having to reposition them manually in the DataStar UI, we can use the “x” and “y” properties of tasks. Note that since we are now changing existing objects, we need to use the save method to commit the changes:

In the green outlined box, we see that the x-coordinate on the Macro Canvas for the import_shipments_task is set to 250 (line 147) and its y-coordinate to 150 (line 148). In line 149 we use the save method to persist these values.

Now we can kick off the macro run and monitor its progress:

  1. The run method is used to start the run of the just created macro “Populate 3 CF Model Tables”.
  2. The wait_for_done method is used with the verbose argument set to True. This means that the terminal will give regular updates on the status of the macro run.

While the macro is running, messages written to the terminal by the wait_for_done method will look similar to following:

We see 4 messages where the status was “processing” and then a final fifth one stating the macro run has completed. Other statuses one might see are pending when the macro has not yet started and errored in case the macro could not finish successfully.

Opening the DataStar application, we can check the project and CSV connection were created on the DataStar startpage. They are indeed there, and we can open the “Scripting with DataStar” project to check the “Populate 3 CF Model Tables” macro and the results of its run:

A screenshot of a diagramAI-generated content may be incorrect.

The macro contains the 7 tasks we expect and checking their configurations shows they are set up the way we intended to.

Next, we have a look at the Data Connections tab to see the results of running the macro:

  1. We are on the Data Connections tab.
  2. This is the CSV file data connection that was created by the script.
  3. We also look at the tables in the Project Sandbox:
    1. The raw_shipments table in here is the result of running the Import task the script had added to the macro as the first task.
    2. The other 3 tables are the result of the 3 Run SQL tasks that the script also added to the macro, these now contain unique DCs (distribution_centers table), unique customers (customers table), and demand aggregated by customer and product for a 12 month period (customer_demand table).
  1. Finally, we expand the Cosmic Frog Model connection while not showing empty tables. Three of these tables have been populated by the Export tasks that were added by the script:
    1. The customerdemand table is populated and we see that the number of records is the same as that of the customer_demand table in the Project Sandbox. This table is also the selected table, which opens its preview grid in the central part of DataStar.
    2. The customers and facilites tables are also populated with the expected number of records based on the tables in the Project Sandbox that were used as the source for the export.
  1. The grid preview of the customerdemand table in the Cosmic Frog Model connection.

Appendix - Code of All Scripts

Here follows the code of each of the above examples. You can copy and paste this into your own scripts and modify them to your needs. Note that whenever names and paths are used, you may need to update these to match your own environment.

Get list of DataStar projects in user's Optilogic account and print list to terminal:

from datastar import *

project_list = Project.get_projects()
print(project_list)

Connect to the project named "Import Historical Shipments" and get the list of macros within this project. Print this list to the terminal:

from datastar import *

project = Project.connect_to("Import Historical Shipments")
macro_list = project.get_macros()
print(macro_list)

In the same "Import Historical Shipments" project, get the macro named "Import Shipments", and get the list of tasks within this macro. Print the list with task names to the terminal:

from datastar import *

project = Project.connect_to("Import Historical Shipments")
macro = project.get_macro("Import Shipments")
task_list = macro.get_tasks()
print(task_list)

Copy 3 of the 7 tasks in the "Cost Data" macro in the "Data Cleansing and Aggregation NA Model" project to a new macro "Cost Data EMEA" in a new project "Data Cleansing and Aggregation EMEA". Do this by first copying the whole macro and then removing the tasks that are not required in this new macro:

from datastar import *

# connect to project and get macro to be copied into new project
project = Project.connect_to("Data Cleansing and Aggregation NA Model")
macro = project.get_macro("Cost Data")

# create new project and clone macro into it 
new_project = Project.create("Data Cleansing and Aggregation EMEA")
new_macro = macro.clone(new_project,name="Cost Data EMEA",
                        description="Cloned from NA project; \
                                    keep 3 transportation tasks")

# list the transportation cost related tasks to be kept and get a list
# of tasks present in the copied macro in the new project, so that we 
# can determine which tasks to delete
tasks_to_keep = ["Start",
                 "Import Transportation Cost Data",
                 "Cleanse TP Costs",
                 "Aggregate TP Costs by Month"]
tasks_present = new_macro.get_tasks()

# go through tasks present in the new macro and 
# delete if the task name is not in the "to keep" list
for task in tasks_present:
    if task not in tasks_to_keep:
        new_macro.delete_task(task)

Copy specific task "Import General Ledger" from the "Cost Data" macro in the "Data Cleansing and Aggregation NA Model" project to the "Cost Data EMEA" macro in the "Data Cleansing and Aggregation EMEA" project. Chain this copied task to the Start task:

from datastar import *

project_1 = Project.connect_to("Data Cleansing and Aggregation NA Model")
macro_1 = project_1.get_macro("Cost Data")

project_2 = Project.connect_to("Data Cleansing and Aggregation EMEA")
macro_2 = project_2.get_macro("Cost Data EMEA")

task_to_copy = macro_1.get_task("Import General Ledger")
start_task = macro_2.get_task("Start")
copied_task = macro_2.add_task(task_to_copy,
                               auto_join=False,
                               previous_task=start_task)

Creating a CSV file connection and a Cosmic Frog Model connection:

from datastar import *

shipments = DelimitedConnection(
    name="Shipment Data",
    path="/My Files/DataStar/Shipments.csv",
    delimiter=","
)

cf_global_sc_strategy = FrogModelConnection(
    name="Global SC Strategy CF Model",
    model_name="Global Supply Chain Strategy"
)

Connect directly to a project's sandbox, read data into a pandas dataframe, transform it, and write the new dataframe into a new table "new_customers":

from datastar import *

# connect to project and get its sandbox
project = Project.connect_to("Import Historical Shipments")
sandbox = project.get_sandbox()

# use pandas to raed the "customers" table into a dataframe
df_customers = sandbox.read_table("customers")

# copy the dataframe into a new dataframe
df_new_customers = df_customers

# use pandas to change the customername column values format 
# from CZ1, CZ20, etc to Cust_0001, Cust_0020, etc
df_new_customers['customername'] = df_new_customers['customername'].map(lambda x: x.lstrip('CZ'))
df_new_customers['customername'] = df_new_customers['customername'].str.zfill(4)
df_new_customers['customername'] = 'Cust_' + df_new_customers['customername']

# write the updates customers table with the new customername 
# values to a new table "new_customers"
sandbox.write_table(df_new_customers, "new_customers")

End-to-end script - create a new project and add a new macro to it; add 7 tasks to the macro to import shipments data; create unique customers, unique distribution centers, and demand aggregated by customer and product from it. Then export these 3 tables to a Cosmic Frog model:

from datastar import *

#------------------------------------
# Create new project and add macro
#------------------------------------

project = Project.create("Scripting with DataStar", 
                         description= "Show how to use a Python script to "
                         "create a DataStar project, add connections, create "
                         "a macro and its tasks, and run the macro.")
macro = project.add_macro(name="Populate 3 CF Model Tables")

#--------------------
# Get & set up connections
#--------------------

sandbox = project.get_sandbox()

cf_model = Connection.get_connection("Cosmic Frog Model")

shipments = DelimitedConnection(
    name="May2024-Sept2025 Shipments",
    path="/My Files/DataStar/shipments.csv",
    delimiter=",")



#-----------------------
# Create tasks
#-----------------------

# Import Task to import the raw shipments from the shipments CSV connection 
# into a table named raw_shipments in the project sandbox

import_shipments_task = macro.add_import_task(
    name="Import historical shipments",
    source_connection=shipments,
    destination_connection=sandbox,
    destination_table="raw_shipments")

# Add 3 run SQL tasks to create unique DCs, unique Customers, and Customer 
# Demand (aggregated by customer and product from July 2024-June 2025) 
# from the raw shipments data.

create_dc_task = macro.add_run_sql_task(
    name="Create DCs",
    connection=sandbox,
    query="""
        CREATE TABLE IF NOT EXISTS distribution_centers AS
        SELECT DISTINCT origin_dc AS dc_name, 
                        AVG(origin_latitude) AS dc_latitude, 
                        AVG(origin_longitude) AS dc_longitude 
        FROM raw_shipments
        GROUP BY dc_name;""")

create_cz_task = macro.add_run_sql_task(
    name="Create customers",
    connection=sandbox,
    query="""
        CREATE TABLE IF NOT EXISTS customers AS
        SELECT DISTINCT destination_store AS cust_name, 
                        AVG(destination_latitude) AS cust_latitude, 
                        AVG(destination_longitude) AS cust_longitude 
        FROM raw_shipments
        GROUP BY cust_name;""",
    auto_join=False,
    previous_task=import_shipments_task)

create_demand_task = macro.add_run_sql_task(
    name="Create customer demand",
    connection=sandbox,
    query="""
        CREATE TABLE IF NOT EXISTS customer_demand AS
        SELECT destination_store AS cust_name,
                productname, 
                SUM(units) AS demand_quantity 
        FROM raw_shipments
        WHERE TO_DATE(ship_date, 'DD/MM/YYYY') BETWEEN
            '2024-07-01' AND '2025-06-30' 
        GROUP BY cust_name, productname;""",
    auto_join=False,
    previous_task=import_shipments_task)

# Add 3 export tasks to populate the Facilities, Customers,
# and CustomerDemand tables in empty CF model connection

export_dc_task = macro.add_export_task(
    name="Export distribution centers",
    source_connection=sandbox,
    source_table="distribution_centers",
    destination_connection=cf_model,
    destination_table="facilities",
    destination_table_type="existing",
    destination_table_action="replace",
    mappings=[{"sourceType":"text","targetType":"text",
               "sourceColumn":"dc_name","targetColumn":"facilityname"},
              {"sourceType":"number","targetType":"text",
               "sourceColumn":"dc_latitude","targetColumn":"latitude"},
              {"sourceType":"number","targetType":"text",
               "sourceColumn":"dc_longitude","targetColumn":"longitude"}],
    auto_join=False,
    previous_task=create_dc_task)



export_cz_task = macro.add_export_task(
    name="Export customers",
    source_connection=sandbox,
    source_table="customers",
    destination_connection=cf_model,
    destination_table="customers",
    destination_table_type="existing",
    destination_table_action="replace",
    mappings=[{"sourceType":"text","targetType":"text",
               "sourceColumn":"cust_name","targetColumn":"customername"},
              {"sourceType":"number","targetType":"text",
               "sourceColumn":"cust_latitude","targetColumn":"latitude"},
              {"sourceType":"number","targetType":"text",
               "sourceColumn":"cust_longitude","targetColumn":"longitude"}],
    auto_join=False,
    previous_task=create_cz_task)



export_demand_task = macro.add_export_task(
    name="Export customer demand",
    source_connection=sandbox,
    source_table="customer_demand",
    destination_connection=cf_model,
    destination_table="customerdemand",
    destination_table_type="existing",
    destination_table_action="replace",
    mappings=[{"sourceType":"text","targetType":"text",
               "sourceColumn":"cust_name","targetColumn":"customername"},
              {"sourceType":"text","targetType":"text",
               "sourceColumn":"productname","targetColumn":"productname"},
              {"sourceType":"number","targetType":"text",
               "sourceColumn":"demand_quantity","targetColumn":"quantity"}],
    auto_join=False,
    previous_task=create_demand_task)


#--------------------------------
# Position tasks on Macro Canvas
#--------------------------------

import_shipments_task.x = 250
import_shipments_task.y = 150
import_shipments_task.save()

create_dc_task.x = 500
create_dc_task.y = 10
create_dc_task.save()

create_cz_task.x = 500
create_cz_task.y = 150
create_cz_task.save()

create_demand_task.x = 500
create_demand_task.y = 290
create_demand_task.save()

export_dc_task.x = 750
export_dc_task.y = 10
export_dc_task.save()

export_cz_task.x = 750
export_cz_task.y = 150
export_cz_task.save()

export_demand_task.x = 750
export_demand_task.y = 290
export_demand_task.save()

#-----------------------------------------------------
# Run the macro and write regular progress updates
#-----------------------------------------------------

macro.run()
macro.wait_for_done(verbose=True)

DataStar users can take advantage of the datastar Python library, which gives users access to DataStar projects, macros, tasks, and connections through Python scripts. This way users can build, access, and run their DataStar workflows programmatically. The library can be used in a user’s own Python environment (local or on the Optilogic platform), and it can also be used in Run Python tasks in a DataStar macro.

In this documentation we will cover how to use the library through multiple examples. At the end, we will step through an end-to-end script that creates a new project, adds a macro to the project, and creates multiple tasks that are added to the macro. The script then runs the macro while giving regular updates on its progress.

Before diving into the details of this article, it is recommended to read this “Setup: Python Scripts for Cosmic Frog and DataStar” article first; it explains what users need to do in terms of setup before they can run Python scripts using the datastar library. To learn more about the DataStar application itself, please see these articles on Optilogic’s Help Center.

Succinct documentation in PDF format of all datastar library functionality can be downloaded here (please note that the long character string at the beginning of the filename is expected). This includes a list of all available properties and methods for the Project, Macro, Task, and Connection classes at the end of the document.

All Python code that is shown in the screenshots throughout this documentation is available in the Appendix, so that you can copy-paste from there if you want to run the exact same code in your own Python environment and/or use these as jumping off points for your own scripts.

Getting Information about Projects and Macros

If you have reviewed the “Setup: Python Scripts for Cosmic Frog and DataStar” article and are set up with your local or online Python environment, we are ready to dive in! First, we will see how we can interrogate existing projects and macros using Python and the datastar library. We want to find out which DataStar projects are already present in the user’s Optilogic account.

  1. If we want to use the datastar python library, we have to either import it entirely (“import datastar”) or import the classes that we want to use from it. In our case, we are importing all classes via “from datastar import *”. This way each class is imported separately and can be referenced individually, e.g. Project for the project class, Macro for the macro class, etc.
  2. To find out which DataStar projects are already existing in our user account, we create a variable named project_list and set it by using a method from the Project class. After typing “Project.”, a list of available methods for the Project class comes up.
  3. We choose the “get_projects” method:

Once the parentheses are typed, hover text comes up with information about this function. It tells us that the outcome of this method will be a list of strings, and the description of the method reads “Retrieve all project names visible to the authenticated user”. Most methods will have similar hover text describing the method, the arguments it takes and their default values, and the output format.

Now that we have a variable that contains the list of DataStar projects in the user account, we want to view the value of this variable:

  1. To view the list of projects, we add a simple print statement to print the value of the project_list variable.
  2. We can run this small script by clicking on the play button at the top right of the script.
  3. The output of the print statement will appear in the terminal at the bottom of the code editor, and we see that this user has 5 DataStar projects in their user account.

Next, we want to dig one level deeper and for the “Import Historical Shipments” project find out what macros it contains:

  1. We use the connect_to method of the Project class to set our “project” variable to the “Import Historical Shipments” project. Now, whenever the Python code does something with “project” it applies to this “Import Historical Shipments” project.
  2. We create a variable “macro_list” which will contain the list of macros in this project by using the get_macros method of the Project class.
  3. Again, we will write the resulting list to the terminal using a print statement.  
  4. We have run the script and see that this project contains 1 macro named “Import Shipments”.

Finally, we will retrieve the tasks this “Import Shipments” macro contains in a similar fashion:

  1. We set the variable “macro” to the “Import Shipments” macro by using the get_macro method in the Project class. Now whenever we reference “macro”, it applies to the “Import Shipments” macro.
  2. The variable task_list will contain the tasks the macro contains by using the get_tasks method available in the Macro class.
  3. We will print the task_list to the terminal.
  4. We have run the script and see that this macro contains 4 tasks: “Start” (each macro contains a Start task as the first task), “Import Raw Shipments”, “Create customers from shipments”, and “Export Customers”.

In addition, we can have a quick look in the DataStar application to see that the information we are getting from the small scripts above matches what we have in our account in terms of projects (first screenshot below), and the “Import Shipments” macro plus its tasks in the “Import Historical Shipments” project (second screenshot below):

A screenshot of a computerAI-generated content may be incorrect.
A screenshot of a computerAI-generated content may be incorrect.

Besides getting information about projects and macros, other useful methods for projects and macros include:

  • Project.create to create a new project.
  • The project.name and macro.name properties can be set to change the name of a project or macro (need to save to commit the change).
  • The project.description and macro.description properties can be set to change the description of a project or macro (need to save to commit the change).
  • project.save and macro.save to save a changed project / macro name or description.
  • project.delete and macro.delete to remove a project or macro.

Note that when creating new objects (projects, macros, tasks or connections) these are automatically saved. If existing objects are modified, their changes need to be committed by using the save method.

Copying Macros and Tasks

Macros can be copied, either within the same project or into a different project. Tasks can also be copied, either within the same macro, between macros in the same project, or between macros of different projects. If a task is copied within the same macro, its name will automatically be suffixed with (Copy).

As an example, we will consider a macro called “Cost Data” in a project named “Data Cleansing and Aggregation NA Model”, which is configured as follows:

A diagram of a companyAI-generated content may be incorrect.

The North America team shows this macro to their EMEA counterparts who realize that they could use part of this for their purposes, as their transportation cost data has the same format as that of the NA team. Instead of manually creating a new macro with new tasks that duplicate the 3 transportation cost related ones, they decide to use a script where first the whole macro is copied to a new project, and then the 4 tasks which are not relevant for the EMEA team are deleted:

  1. Connect to the existing NA project and get the macro we want to copy (“Cost Data”).
  2. Create a new EMEA-based project and use the clone method of the Macro class to copy the whole macro and all its dependencies into the new project. By using the name argument, we rename the macro to “Cost Data EMEA”. An optional description has been added too.
  3. Here, the tasks that we want to keep are listed: the Start task and the 3 transportation cost related ones. The get_tasks method is used to get a list of all the tasks in the new macro (the “Cost Data EMEA” one).
  4. This logic uses the delete method of the Task class to delete tasks that are currently in the Cost Data EMEA macro that we do not want to keep.

After running the script, we see in DataStar that there is indeed a new project named “Data Cleansing and Aggregation EMEA” which has a “Cost Data EMEA” macro that contains the 3 transportation cost related tasks that we wanted to keep:

A screenshot of a computerAI-generated content may be incorrect.

Note that another way we could have achieved this would have been to copy the 3 tasks from the macro in the NA project to the new macro in the EMEA project. The next example shows this for one task. Say that after the Cost Data EMEA macro was created, the team finds they also have a use for the “Import General Ledger” task that was deleted as it was not on the list of “tasks to keep”. In an extension of the previous script or a new one, we can leverage the add_task method of the Macro class to copy the “Import General Ledger” task from the NA project to the EMEA one:

  1. We set the project and macro variables to the NA project and its “Cost Data” macro, and to the EMEA project and its “Cost Data EMEA” macro.
  2. The get_task method is used to set task_to_copy to the “Import General Ledger” task (in the “Cost Data” macro of the NA project).
  3. Since we want to connect the “Import General Ledger” task to the Start task in the “Cost Data EMEA” macro it is being copied to, we need to get this task; it is set as the start_task variable.
  4. Now, we use the add_task method to add the task_to_copy (“Import General Ledger” from “Cost Data” macro in the NA project) to macro_2, which is the “Cost Data EMEA” macro in the EMEA project. We use the optional auto_join and previous_task arguments to ensure the task is connected to the Start task. When setting auto_join to True or not specifying it (default of True will be used in that case), the task will be chained to the last task in the macro. When auto_join = False, the previous_task argument specifies which task the added task will be chained to and if previous_task is not set, the task will not be chained to another task.

After running the script, we see that the “Import General Ledger” task is now part of the “Cost Data EMEA” macro and is connected to the Start task:

Several additional helpful features on chaining tasks together in a macro are:

  • Use task.add_dependency(previous_task_name) and task.remove_dependency(previous_task_name) to connect or disconnect from a preceding task.
  • Use the get_dependencies method to get a list of a task’s preceding tasks. E.g.:
    • deps = export_task.get_dependencies()
    • print(deps)
  • Use the add_tasks method to add multiple tasks at once, either chained in order or all connected to and fanning out from 1 preceding task:
    • macro.add_tasks([task1, task2, task3]) – this will chain the tasks in the order specified
    • macro.add_tasks([taskX, taskY, taskZ], previous_task = some_task) – this will connect all 3 tasks that are added to 1 preceding task some_task

Creating Connections

DataStar connections allow users to connect to different types of data sources, including CSV-files, Excel files, Cosmic Frog models, and Postgres databases. These data sources need to be present on the Optilogic platform (i.e. visible in the Explorer application). They can then be used as sources / destinations / targets for tasks within DataStar.  

We can use scripts to create data connections:

  1. To create a connection to a CSV-file, use the DelimitedConnection class:
    1. Use the name argument to give the connection a name, otherwise connections will by default be named Connection 1, Connection 2, etc.
    2. The path to a CSV-file on the Optilogic platform will need to be specified.
    3. For CSV-files, the delimiter needs to be set to comma.
    4. Optionally, a description and encoding can be specified too.
  1. The FrogModelConnection class can be used to create a connection to a Cosmic Frog model:
    1. Use the name argument to give the connection a name and prevent it being given a default name.
    2. The model_name argument needs to be set to the name of an existing Cosmic Frog model in the user’s account. A path does not need to be specified, just the name suffices.
    3. Optionally, a description can be added.

After running this script, we see the connections have been created. In the following screenshot, the Explorer is on the left, and it shows the Cosmic Frog model “Global Supply Chain Strategy.frog” and the Shipments.csv file. The connections using these are listed in the Data Connections tab of DataStar. Since we did not specify any description, an auto-generated description “Created by the Optilogic Datastar library” was added to each of these 2 connections:

In addition to the connections shown above, data connections to Excel files (.xls and .xlsx) and PostgreSQL databases which are stored on the Optilogic platform can be created too. Use the ExcelConnection and OptiConnection classes to set up such these types of connections up.

Accessing the Project Sandbox Directly

Each DataStar project has its own internal data connection, the project sandbox. This is where users perform most of the data transformations after importing data into the sandbox. Using scripts, we can access and modify data in this sandbox directly instead of using tasks in macros to do so. Note that if you have a repeatable data workflow in DataStar which is used periodically to refresh a Cosmic Frog model where you update your data sources and re-run your macros, you need to be mindful of making one-off changes to the project sandbox through a script. When you change data in the sandbox through a script, macros and tasks are not updated to reflect these modifications. When running the data workflow the next time, the results may be different if that change the script made is not made again. If you want to include such changes in your macro, you can add a Run Python task to your macro within DataStar.

Our “Import Historical Shipments” project has a table named customers in its project sandbox:

A screenshot of a computerAI-generated content may be incorrect.
  1. We have opened the DataStar application.
  2. The bread crumb trail at the top of the Optilogic platform tells us we are in the “Import Historical Shipments” DataStar project.
  3. In DataStar, click on the Data Connections tab.
  4. Expand the Project Sandbox connection to see its contents.
  5. Click on the customers table to open it in the central part of DataStar.
  6. We notice that due to the format of the customer names these will not sort in numerical order of their customer number.  

To make the customers sort in numerical order of their customer number, our goal in the next script is to update the number part of the customer names with left padded 0’s so all numbers consist of 4 digits. And while we are at it, we are also going to replace the “CZ” prefix with a “Cust_” prefix.

First, we will show how to access data in the project sandbox:

  1. After connecting to the “Import Historical Shipments” project, we use the get_sandbox method of the Project class to set the sandbox variable to the project sandbox of this project.
  2. We can use the read_table method to create a pandas dataframe (df) that contains the contents of the customers table.
  3. Use a print statement to write the contents of the df_customers variable to the terminal.
  4. The result of running this script is the list of customers shown in the terminal. This matches what we saw in DataStar when opening the customers table (previous screenshot).

Next, we will use functionality of the pandas Python library (installed as a dependency when installing the datastar library) to transform the customer names to our desired Cust_xxxx format:

  1. We create a new dataframe, df_new_customers, for manipulation of the customername column values. Initially we set it to the df_customers dataframe as our starting point.
  2. These 3 lines manipulate the values in the customername column:
    1. This step removes the “CZ” from the start of the customer names.
    2. The code on line 9 adds 0’s to the left of the customer number so that it will be a 4-digit number (left-padding up to 4 characters with zeros).
    3. Lastly, we add “Cust_” as the prefix.
  1. Again, we will print the contents of the dataframe to the terminal.
  2. We have run the script and see that our desired changes to the values in the customername column have been made.

As a last step, we can now write the updated customer names back into the customers table in the sandbox. Or, if we want to preserve the data in the sandbox, we can also write to a new table as is done in the next screenshot:

We use the write_table method to write the dataframe with the updated customer names into a new table called “new_customers” in the project sandbox. After running the script, opening this new table in DataStar shows us that the updates worked:

End-to-end Script

Finally, we will put everything we have covered above together in one script which will:

  • Create a new project and add a new macro to it.
  • Set up 1 data connection to a Shipments.csv file.
  • Access 1 existing data connection to an empty Cosmic Frog model.
  • Access the Project Sandbox built-in connection.
  • Create an Import task to import the historical shipments from the shipments.csv file into a table in the project sandbox.
  • Create 3 Run SQL tasks which will create unique customers, unique distribution centers, and aggregated customer demand from the historical shipment data.
  • Create 3 Export tasks, which will export the customers, distribution centers, and customer demand into the corresponding tables in the Cosmic Frog model.
  • Run the Macro and give regular updates on its progress while running.

We will look at this script through the next set of screenshots. For those who would like to run this script themselves, and possibly use it as a starting point to modify into their own script:  

  1. The script code can be copied from the appendix further below.
  2. On the Optilogic platform you need to create a new empty Cosmic Frog model and set up a data connection named “Cosmic Frog Model” to it from within DataStar.  
    1. To create an empty Cosmic Frog model in the Explorer application on the Optilogic platform: right-click on the folder you want to create the model in, choose “Create Cosmic Frog model”, choose the second option “Anura New Model”, type your desired name and hit Enter.
    2. To set up the data connection: open the DataStar application, click on the Create Data Connection button, use “Cosmic Frog Model” as the name for the connection, optionally write a description, set the Connection Type to Cosmic Frog Models, select the model created under the previous bullet in the list of models, and click on Add Connection.
  1. The zipped Shipments.csv file can be downloaded here (please note that the long character string at the beginning of the zip's file name is expected). To match the script, you need to upload it to your My Files/DataStar folder on the Optilogic platform. The file contains about 42.6k shipment records and has the following structure:
A screenshot of a computerAI-generated content may be incorrect.
  1. Import all classes from the datastar library.
  2. The top part of the script will create a new project and add a macro to it:
    1. The new project is created with the name set as “Scripting with DataStar”. A description of what the project aims to do has been added too.
    2. A macro named “Populate 3 CF Model Tables” is added to the project.
  1. The data connections are created / connected to in the second part of the script:
    1. Using the get_sandbox method, we can now interact with the sandbox by using the variable named “sandbox”.
    2. Using the Connection.get_connection method, we have now set the “cf_model” variable to the Cosmic Frog data connection we created prior.
    3. A CSV-file connection to the shipments.csv file located in the /My Files/DataStar folder is created. If you uploaded the file to another location on the Optilogic platform, you need to update the path (line 23). The name of the connection which will be used by DataStar is “May2024-Sept2025 Shipments”; the script uses the “shipments” variable to interact with the connection.

Next, we will create 7 tasks to add to the “Populate 3 CF Model Tables” macro, starting with an Import task:

A computer screen with text and imagesAI-generated content may be incorrect.
  1. The first task is an Import task created by the “import_shipments_task” variable. This task will import the shipment data from the CSV-file connection into a new table “raw_shipments” in the project sandbox:
    1. The name of the task is “Import historical shipments” as set by the name argument.
    2. The data source from which the data is pulled is specified by the source_connection argument; this is the shipments CSV-file connection in this case.
    3. We want to import the data from the source_connection into the project sandbox, which is specified by the destination_connection argument.
    4. The destination_table argument specifies which table in the destination_connection the data will be imported into. Here this is a table named “raw_shipments” which will be overwritten if it already exists or created if it does not yet exist.
  1. The second task is a Run SQL task set by the “create_dc_task” variable. This task will create unique distribution centers (DCs) from the values in the Origin DC column in the raw_shipments table:
    1. The name of the task is “Create DCs”.
    2. The target connection is the project sandbox.
    3. The query argument contains the SQL query to be run when the task runs as a string. This query will create a new table named “distribution_centers” in the sandbox which will have a column named “dc_name” that contains the distinct values from the origin_dc column in the raw_shipments table. Averages of the origin_latitude and origin_latitude columns are used to calculate the values for the dc_latitude and dc_longitude columns in the new table.

Similar to the “create_dc_task” Run SQL task, 2 more Run SQL tasks are created to create unique customers and aggregated customer demand from the raw_shipments table:

A screenshot of a computer programAI-generated content may be incorrect.
  1. These 2 Run SQL tasks are created in a very similar way as the first one, except that their names and SQL queries are different:
    1. The “Create Customers” SQL task creates a new table called “customers” in the project sandbox and uses very similar logic as the “Create DCs” task to create unique customers, just using the destination_store, destination_latitude, and destination_longitude columns from the raw_shipments table to create the cust_name, cust_latitude, and cust_longitude columns in the new table.
    2. The SQL query for the “Create Customer Demand” Run SQL task aggregates the raw_shipments to get the total demand quantity by customer and product, while filtering the shipments so that only those from July 1st, 2024, through June 30th, 2025, are included.
  1. These 2 tasks both use the auto_join=False and previous_task arguments to ensure they are chained to the import_shipments_task rather than to the last created task.

Now that we have generated the distribution_centers, customers, and customer_demand tables in the project sandbox using the 3 SQL Run tasks, we want to export these tables into their corresponding Cosmic Frog tables (facilities, customers, and customerdemand) in the empty Cosmic Frog model:

A computer screen with many colorful textAI-generated content may be incorrect.
  1. The first Export task to be added is assigned to the export_dc_task variable:
  2. Its name is set as “Export Distribution Centers”.
  3. The sandbox is the source_connection, the data is exported from here into the destination.
  4. The table in the sandbox to be exported is set by the source_table argument, “distribution_centers” here.
  5. The connection to receive the exported data is set by the destination_connection, the cf_model connection here.
  6. The “facilities” table in the cf_model connection is set as the destination_table, so the data from the “distribution_centers” table in the sandbox will be exported into the “facilities” table in the cf_model connection.
  7. The destination_table_type indicates whether the table that the data is exported to is an existing or new table. If set to new, it will be created when the task is run. In our example here, the “facilities” table is an already existing table, so the argument is set to “existing”.
  8. In case the destination_table_type (previous bullet) is set to existing, the destination_table_action argument sets whether any pre-existing data in the destination_table should be overwritten (value = “replace”) or if the new data should be appended to the existing data (value = “append”).
  9. If column names and types between the source_table and destination_table do not match exactly, the mappings argument needs to be used to indicate which source_column maps to which destination_column and what the types of both are. Here, the dc_name source column, a text column, is mapped to the facilityname destination column, also a text column, etc. The format is a list of dictionaries with 4 key:value pairs in a dictionary for each mapping. The 4 keys are sourceType, targetType, sourceColumn, and targetColumn.
  10. Since we do not want this task to be chained to the last created one (“Create customer demand”, see above) we use the auto_join argument and set it to False.
  11. This task should be chained to the Run SQL task that creates the distribution_centers table, which is the create_dc_task, so we set the previous_task argument to this task.

The following 2 Export tasks are created in a very similar way:

This completes the build of the macro and its tasks.

If we run it like this, the tasks will be chained in the correct way, but they will be displayed on top of each other on the Macro Canvas in DataStar. To arrange them nicely and prevent having to reposition them manually in the DataStar UI, we can use the “x” and “y” properties of tasks. Note that since we are now changing existing objects, we need to use the save method to commit the changes:

In the green outlined box, we see that the x-coordinate on the Macro Canvas for the import_shipments_task is set to 250 (line 147) and its y-coordinate to 150 (line 148). In line 149 we use the save method to persist these values.

Now we can kick off the macro run and monitor its progress:

  1. The run method is used to start the run of the just created macro “Populate 3 CF Model Tables”.
  2. The wait_for_done method is used with the verbose argument set to True. This means that the terminal will give regular updates on the status of the macro run.

While the macro is running, messages written to the terminal by the wait_for_done method will look similar to following:

We see 4 messages where the status was “processing” and then a final fifth one stating the macro run has completed. Other statuses one might see are pending when the macro has not yet started and errored in case the macro could not finish successfully.

Opening the DataStar application, we can check the project and CSV connection were created on the DataStar startpage. They are indeed there, and we can open the “Scripting with DataStar” project to check the “Populate 3 CF Model Tables” macro and the results of its run:

A screenshot of a diagramAI-generated content may be incorrect.

The macro contains the 7 tasks we expect and checking their configurations shows they are set up the way we intended to.

Next, we have a look at the Data Connections tab to see the results of running the macro:

  1. We are on the Data Connections tab.
  2. This is the CSV file data connection that was created by the script.
  3. We also look at the tables in the Project Sandbox:
    1. The raw_shipments table in here is the result of running the Import task the script had added to the macro as the first task.
    2. The other 3 tables are the result of the 3 Run SQL tasks that the script also added to the macro, these now contain unique DCs (distribution_centers table), unique customers (customers table), and demand aggregated by customer and product for a 12 month period (customer_demand table).
  1. Finally, we expand the Cosmic Frog Model connection while not showing empty tables. Three of these tables have been populated by the Export tasks that were added by the script:
    1. The customerdemand table is populated and we see that the number of records is the same as that of the customer_demand table in the Project Sandbox. This table is also the selected table, which opens its preview grid in the central part of DataStar.
    2. The customers and facilites tables are also populated with the expected number of records based on the tables in the Project Sandbox that were used as the source for the export.
  1. The grid preview of the customerdemand table in the Cosmic Frog Model connection.

Appendix - Code of All Scripts

Here follows the code of each of the above examples. You can copy and paste this into your own scripts and modify them to your needs. Note that whenever names and paths are used, you may need to update these to match your own environment.

Get list of DataStar projects in user's Optilogic account and print list to terminal:

from datastar import *

project_list = Project.get_projects()
print(project_list)

Connect to the project named "Import Historical Shipments" and get the list of macros within this project. Print this list to the terminal:

from datastar import *

project = Project.connect_to("Import Historical Shipments")
macro_list = project.get_macros()
print(macro_list)

In the same "Import Historical Shipments" project, get the macro named "Import Shipments", and get the list of tasks within this macro. Print the list with task names to the terminal:

from datastar import *

project = Project.connect_to("Import Historical Shipments")
macro = project.get_macro("Import Shipments")
task_list = macro.get_tasks()
print(task_list)

Copy 3 of the 7 tasks in the "Cost Data" macro in the "Data Cleansing and Aggregation NA Model" project to a new macro "Cost Data EMEA" in a new project "Data Cleansing and Aggregation EMEA". Do this by first copying the whole macro and then removing the tasks that are not required in this new macro:

from datastar import *

# connect to project and get macro to be copied into new project
project = Project.connect_to("Data Cleansing and Aggregation NA Model")
macro = project.get_macro("Cost Data")

# create new project and clone macro into it 
new_project = Project.create("Data Cleansing and Aggregation EMEA")
new_macro = macro.clone(new_project,name="Cost Data EMEA",
                        description="Cloned from NA project; \
                                    keep 3 transportation tasks")

# list the transportation cost related tasks to be kept and get a list
# of tasks present in the copied macro in the new project, so that we 
# can determine which tasks to delete
tasks_to_keep = ["Start",
                 "Import Transportation Cost Data",
                 "Cleanse TP Costs",
                 "Aggregate TP Costs by Month"]
tasks_present = new_macro.get_tasks()

# go through tasks present in the new macro and 
# delete if the task name is not in the "to keep" list
for task in tasks_present:
    if task not in tasks_to_keep:
        new_macro.delete_task(task)

Copy specific task "Import General Ledger" from the "Cost Data" macro in the "Data Cleansing and Aggregation NA Model" project to the "Cost Data EMEA" macro in the "Data Cleansing and Aggregation EMEA" project. Chain this copied task to the Start task:

from datastar import *

project_1 = Project.connect_to("Data Cleansing and Aggregation NA Model")
macro_1 = project_1.get_macro("Cost Data")

project_2 = Project.connect_to("Data Cleansing and Aggregation EMEA")
macro_2 = project_2.get_macro("Cost Data EMEA")

task_to_copy = macro_1.get_task("Import General Ledger")
start_task = macro_2.get_task("Start")
copied_task = macro_2.add_task(task_to_copy,
                               auto_join=False,
                               previous_task=start_task)

Creating a CSV file connection and a Cosmic Frog Model connection:

from datastar import *

shipments = DelimitedConnection(
    name="Shipment Data",
    path="/My Files/DataStar/Shipments.csv",
    delimiter=","
)

cf_global_sc_strategy = FrogModelConnection(
    name="Global SC Strategy CF Model",
    model_name="Global Supply Chain Strategy"
)

Connect directly to a project's sandbox, read data into a pandas dataframe, transform it, and write the new dataframe into a new table "new_customers":

from datastar import *

# connect to project and get its sandbox
project = Project.connect_to("Import Historical Shipments")
sandbox = project.get_sandbox()

# use pandas to raed the "customers" table into a dataframe
df_customers = sandbox.read_table("customers")

# copy the dataframe into a new dataframe
df_new_customers = df_customers

# use pandas to change the customername column values format 
# from CZ1, CZ20, etc to Cust_0001, Cust_0020, etc
df_new_customers['customername'] = df_new_customers['customername'].map(lambda x: x.lstrip('CZ'))
df_new_customers['customername'] = df_new_customers['customername'].str.zfill(4)
df_new_customers['customername'] = 'Cust_' + df_new_customers['customername']

# write the updates customers table with the new customername 
# values to a new table "new_customers"
sandbox.write_table(df_new_customers, "new_customers")

End-to-end script - create a new project and add a new macro to it; add 7 tasks to the macro to import shipments data; create unique customers, unique distribution centers, and demand aggregated by customer and product from it. Then export these 3 tables to a Cosmic Frog model:

from datastar import *

#------------------------------------
# Create new project and add macro
#------------------------------------

project = Project.create("Scripting with DataStar", 
                         description= "Show how to use a Python script to "
                         "create a DataStar project, add connections, create "
                         "a macro and its tasks, and run the macro.")
macro = project.add_macro(name="Populate 3 CF Model Tables")

#--------------------
# Get & set up connections
#--------------------

sandbox = project.get_sandbox()

cf_model = Connection.get_connection("Cosmic Frog Model")

shipments = DelimitedConnection(
    name="May2024-Sept2025 Shipments",
    path="/My Files/DataStar/shipments.csv",
    delimiter=",")



#-----------------------
# Create tasks
#-----------------------

# Import Task to import the raw shipments from the shipments CSV connection 
# into a table named raw_shipments in the project sandbox

import_shipments_task = macro.add_import_task(
    name="Import historical shipments",
    source_connection=shipments,
    destination_connection=sandbox,
    destination_table="raw_shipments")

# Add 3 run SQL tasks to create unique DCs, unique Customers, and Customer 
# Demand (aggregated by customer and product from July 2024-June 2025) 
# from the raw shipments data.

create_dc_task = macro.add_run_sql_task(
    name="Create DCs",
    connection=sandbox,
    query="""
        CREATE TABLE IF NOT EXISTS distribution_centers AS
        SELECT DISTINCT origin_dc AS dc_name, 
                        AVG(origin_latitude) AS dc_latitude, 
                        AVG(origin_longitude) AS dc_longitude 
        FROM raw_shipments
        GROUP BY dc_name;""")

create_cz_task = macro.add_run_sql_task(
    name="Create customers",
    connection=sandbox,
    query="""
        CREATE TABLE IF NOT EXISTS customers AS
        SELECT DISTINCT destination_store AS cust_name, 
                        AVG(destination_latitude) AS cust_latitude, 
                        AVG(destination_longitude) AS cust_longitude 
        FROM raw_shipments
        GROUP BY cust_name;""",
    auto_join=False,
    previous_task=import_shipments_task)

create_demand_task = macro.add_run_sql_task(
    name="Create customer demand",
    connection=sandbox,
    query="""
        CREATE TABLE IF NOT EXISTS customer_demand AS
        SELECT destination_store AS cust_name,
                productname, 
                SUM(units) AS demand_quantity 
        FROM raw_shipments
        WHERE TO_DATE(ship_date, 'DD/MM/YYYY') BETWEEN
            '2024-07-01' AND '2025-06-30' 
        GROUP BY cust_name, productname;""",
    auto_join=False,
    previous_task=import_shipments_task)

# Add 3 export tasks to populate the Facilities, Customers,
# and CustomerDemand tables in empty CF model connection

export_dc_task = macro.add_export_task(
    name="Export distribution centers",
    source_connection=sandbox,
    source_table="distribution_centers",
    destination_connection=cf_model,
    destination_table="facilities",
    destination_table_type="existing",
    destination_table_action="replace",
    mappings=[{"sourceType":"text","targetType":"text",
               "sourceColumn":"dc_name","targetColumn":"facilityname"},
              {"sourceType":"number","targetType":"text",
               "sourceColumn":"dc_latitude","targetColumn":"latitude"},
              {"sourceType":"number","targetType":"text",
               "sourceColumn":"dc_longitude","targetColumn":"longitude"}],
    auto_join=False,
    previous_task=create_dc_task)



export_cz_task = macro.add_export_task(
    name="Export customers",
    source_connection=sandbox,
    source_table="customers",
    destination_connection=cf_model,
    destination_table="customers",
    destination_table_type="existing",
    destination_table_action="replace",
    mappings=[{"sourceType":"text","targetType":"text",
               "sourceColumn":"cust_name","targetColumn":"customername"},
              {"sourceType":"number","targetType":"text",
               "sourceColumn":"cust_latitude","targetColumn":"latitude"},
              {"sourceType":"number","targetType":"text",
               "sourceColumn":"cust_longitude","targetColumn":"longitude"}],
    auto_join=False,
    previous_task=create_cz_task)



export_demand_task = macro.add_export_task(
    name="Export customer demand",
    source_connection=sandbox,
    source_table="customer_demand",
    destination_connection=cf_model,
    destination_table="customerdemand",
    destination_table_type="existing",
    destination_table_action="replace",
    mappings=[{"sourceType":"text","targetType":"text",
               "sourceColumn":"cust_name","targetColumn":"customername"},
              {"sourceType":"text","targetType":"text",
               "sourceColumn":"productname","targetColumn":"productname"},
              {"sourceType":"number","targetType":"text",
               "sourceColumn":"demand_quantity","targetColumn":"quantity"}],
    auto_join=False,
    previous_task=create_demand_task)


#--------------------------------
# Position tasks on Macro Canvas
#--------------------------------

import_shipments_task.x = 250
import_shipments_task.y = 150
import_shipments_task.save()

create_dc_task.x = 500
create_dc_task.y = 10
create_dc_task.save()

create_cz_task.x = 500
create_cz_task.y = 150
create_cz_task.save()

create_demand_task.x = 500
create_demand_task.y = 290
create_demand_task.save()

export_dc_task.x = 750
export_dc_task.y = 10
export_dc_task.save()

export_cz_task.x = 750
export_cz_task.y = 150
export_cz_task.save()

export_demand_task.x = 750
export_demand_task.y = 290
export_demand_task.save()

#-----------------------------------------------------
# Run the macro and write regular progress updates
#-----------------------------------------------------

macro.run()
macro.wait_for_done(verbose=True)

Have More Questions?

Contact Support

Get in touch

Contact Sales

Get in touch

Visit Frogger Pond Community

Visit our Community