Customer Experience: The solution improves the experience of the data consumer in terms of finding the data, understanding and using it
Technical Compatibility: The solution is easy to implement with our current technology stack (Drupal / DKAN 7.x-1.18.8 and looking to upgrade to DKAN 1.18.12 and API Gateway)
Scalability: The solution must be scalable, i.e. capable of handling high volume datasets and conduct all basic data functions relating to publishing, organising and displaying the data without requiring third party support or ongoing maintenance
The team has the expertise and ability to deliver a technical solution within the outlined timeframes
The team is collaborative, innovative and possess strong communication skills
The solution provides value for money for Transport for NSW
Frequently Asked Questions (FAQs)
What type of users use the Open Data Hub?
The Open Data Hub is used by all kinds of people, groups and organisations including data consumers from within the Transport for NSW cluster, within NSW Government and general data consumers. Open data is used in limitless number of ways, such as:
- Research for academic applications and public policy
- Technological applications including apps (like public transport planning apps and weather apps), alert systems (such as flood and earthquake alerts)
- Tertiary education purposes
What seed funding is available for the Innovation Challenge?
A seed funding pool of up to $90,000 is available with each solution being eligible for up to $30,000. Solutions will be evaluated on value for money.
When does the solution have to be delivered by?
We expect a solution to be delivered by March 2021 for integration and deployment in April 2021.
What integration is required for the solution to the Open Data Hub?
Our platform can proxy APIs and support DKAN 1.18.x compatible extensions. Your solution could also be a standalone tool.
How do I ask questions about this challenge?
Please ask any questions in the dedicated thread on the Open Data Forum.
What if I just have ideas that are more process than technology driven for how to make the Open Data Hub and data better?
Please provide us with this feedback as we want to make data better! However this will not be eligible for the Make Data Great Innovation Challenge.
Do I have to meet all of the objectives?
Not necessarily – just advise in the application which objective your solution will address.
Who is on the judging panel?
The judging panel will be stakeholders and subject matter experts familiar with the data available on Open Data and/or the technology stack for the Open Data Hub.
Will Transport for NSW be licensing the solution or buying the solution? Who owns the IP?
As part of the terms and conditions, the solution may be seed funded. It is expected the solution will be provided and licensed to Transport for NSW to integrate and use with the Open Data Hub with no expiry. The Intellectual Property will belong to the developer and can be licensed to other jurisdictions or organisations.
Q&As from the Virtual Information Session
UPDATE 25 NOVEMBER: The slide pack presented indicated that DKAN = CKAN + Drupal however we would like to correct that and point you here instead. Any modules to be integrated would need to be DKAN/Drupal 7.x modules. The search engine used on the Open Data Hub is the Drupal Search API (not Apache Solr).
The following questions were posed to the Panel and most were answered during the Information Session. Additional information has been provided for clarity in some of these beyond what was mentioned at the Information Session. Questions we were unable to answer due to time have also been addressed. For any other questions, please refer to Where can I ask further questions?
Can you tell us more about IP (Intellectual Property) in regards to this challenge?
Any Intellectual Property (IP) you create as part of your solution for this challenge remains wholly owned by you. This means you can freely sell, license and use the IP elsewhere. We will expect unlimited use of the IP in perpetuity, in line with our preference for no ongoing operational costs. Intellectual Property in this case includes code, tools, software, API that may be developed for your solution.
Can everyone access the data on the Open Data Hub?
Open Data is open to everyone. You need to register in order to see the data. It is a free registration.
Because one of the options is to create a plugin for CKAN, could plugins released as part of this challenge also be released for everyone under the same AGPL license of those packages and able to be used by other government sites which use these platforms?
Absolutely. Once you’ve built the module, that is your IP and you can sell, license and use it as you wish.
How many searches are performed per month?
Unfortunately, we have not been tracking these metrics and are unable to provide this information..
Do you have an estimate of the size of all datasets available on the Open Data Hub?
Please refer to the answer below.
How much data is there to be indexed?
Not all data is hosted on the Open Data Hub. There is some data which is hosted on other data portals or sites. Currently there is over 20GB of data stored in infrastructure we control.
The data is very Sydney centric, and when we’ve requested the data eg bus usage for regional LGAs, we’ve been given a “data privacy” excuse for why the data can’t be provided. Please explain the reason for the different policies for Sydney/non Sydney LGA data, and plans to be more open with data for NSW outside of Sydney.
We’ve been providing patronage data across NSW, though data may have significant holes, which is one of the key things we’d like to improve through the solutions chosen for this innovation challenge. Thank you for the question -- we will follow up and see what other data we can publish.
Why do you have to login just to access information?
The decision to require users to be registered and logged in to access data was a decision made early on for the Open Data Hub. This ensures we are protected from bots, and enables us to provide access-controlled datasets in addition to open data.
Who will we pitch to?
We have a panel who will screen all entries, and shortlist candidates who will be invited to pitch. On the pitch day, there will be a judging panel made up of Subject Matter Experts (SMEs) who are mainly internal stakeholders, as well as some from commercial entities.
Can I submit more than one idea?
Yes – please submit each idea as a separate application. It is unlikely that we will select more than one from the same team so that they can focus on one idea.
Does this challenge include making Open Transport data easier to integrate with open data published by other clusters to deliver information that is useful across government and to the public? Or is there a narrow focus on unlocking value from Transport data by itself?
We’re focused on our data and what’s available on the Open Data Hub. The challenge is for the Transport for NSW Open Data Hub data.
Can we have the data in uniform formats?
Data formats used on the Open Data Hub vary and are based on the format in which the data owners provided it to us. Uniformity of data is an option for those who are still seeking ideas for solutions for this challenge!
Is this challenge limited to the datasets currently available on the Open Data Hub?
If you can make something that applies to all CSV’s or all geospatial data including from other websites, that is fine.
Are there any limitations on the backend technologies?
We don’t have any preference in terms of the technology that is used – we are open to all. Ideally the solution will not require ongoing operational costs, so please take this into consideration.
Are you looking for an integrated/seamless data visualisation/search tool as well?
Yes, we are definitely interested in this kind of tool, especially in terms of search functionality. Our users often have issues searching through our entire data catalogue.
Is the seed funding only for the solution or for end to end implementation?
This is the total amount you will receive from us for the solution and its implementation. Any implementation costs on our end we will cover.
Hi, has an exercise like this been done before? And if yes, whether the outcome was acceptable to TfNSW?
While we have run many Innovation Challenges, the Make Data Great challenge is the first of its kind. It’s different to our regular innovation challenges, which are focused on customer outcomes for travel on public transport, roads, or waterways. The Make Data Great challenge is instead focused on making the Open Data Hub great!
If there are limitations which mean e.g. the most effective browsing/searching solution isn't feasible directly using the CKAN API and database, would it be ok if the data be externally synchronized to our own storage system/db? Doing this could mean the indexes/data could be a few hours old for instance.
We would definitely consider that. Other than our API’s, most of our data is static. We want to make data great and easier to understand, so if you can transform or re-present it to those ends, that would meet our requirements.
Will a recording of the information session be available on the Open Data Hub?
Yes, this is now available here.
Is there already some infrastructure available?
We can use internal infrastructure if it’s part of your solution.
If we're not expected to build something until after Christmas, then what are we submitting on Monday 7 Dec?
You’re submitting your idea for the solution, along with as much detail (technical and otherwise) about how you plan to execute it.
How do we apply for seed funding?
Seed funding information was included in the entry form.
How do I enter my submission?
Submissions closed at midnight on Monday 7 December 2020.
What’s the difference between data.gov and the Open Data hub?
The Open Data Hub is the Transport for NSW open data portal. There is a NSW data portal as well as a National (Australian) data portal. Data.gov is the US data portal.
Can a solution be provided as an R package or a Python library for accessing the datasets on the Open Data Hub?
Definitely - there are no issues around that.
Are data quality audits important in the context of this challenge? Is there a budget for this?
Data quality in terms of any data that is transformed from data available on the Open Data Hub is very important. If you are offering a data quality tool that could be of interest to us but we’d need to better understand your offering. Ask the question -- will it make data great? If so we’re interested!
What kind of solutions are you expecting?
We are expecting DKAN modules, APIs or standalone tools that Make Data Great! Please refer to the details provided on the Innovation Challenge pages for more information.
Is the data available on the Open Data Hub preprocessed, or do we need to do some preprocessing?
Most of the data on the Open Data Hub is preprocessed and our team often does some light touch processing prior to publication. The data provided on the Open Data Hub is provided on a “as is” basis for the data consumer. For this challenge, we do want to #makedatagreat so we’re expecting data to be further processed.
Do you have a Data Management/ Data Quality framework around the use of the platform & can this be shared?
Transport for NSW and NSW Government do have policies around Data and Information Management. Please refer to the Digital.NSW site for more information. For Open Data, we provide data from mainly operational systems. There’s a good reason we’re asking for help to #makedatagreat.
How do you envision this activity moving into the commercial phase for ongoing revenue generation?
We are only able to provide seed funding. Other jurisdictions or agencies or organisations will likely be able to benefit from your solution or skillset.
What search engine is used on the Open Data Hub?
Our search engine is the DKAN Search API - more information can be found on the DKAN https://dkan.readthedocs.io/en/latest/components/search.html and Drupal sites https://www.drupal.org/project/search_api. We have also implemented a module that enables Facet API in the search.
Do you have plans to upgrade to DKAN v2?
This is a possibility, however their roadmap doesn't have the functionality we require until later in 2021.
Is there an API to register an application and create an API key? Is this functionality part of DKAN or custom written?
Not that we are aware of. The API Key functionality is custom/part of the API gateway.
Does judging criteria item #3 suggest that there should be no ongoing costs, or no ongoing costs that are affected by scale?
If there are ongoing operational costs please let us know in the Application/Entry form. We will take that into consideration.
If the original data is not in plain text, such as PDF or image, is there any expectation to transform the data into some form of text, or just a text in description is sufficient?
Please outline your solution in the Application/Entry form. There is no “sufficient” as we are trying to make data easier to consume.
If I create an API, is there a tech stack required or can I use, say for example Flask?
There is no particular tech stack required -- the customer experience is key here. Documentation of your API would be required. Depending on the utility of the API we will want documentation as part of your solution.
What is the dataset growth rate of the Open Data Hub?
We are publishing about 2-3 different datasets per week. But it does depend on the week and the type of data. Some datasets have hundreds of resources. Others may have one small CSV.
How is data aged? Do you remove data that is older than "x" years?
The purpose of Open Data is to make data accessible and open. We often are asked for historical data. So it depends on the data. If the data is superceded we will archive it however there has to be a real reason before we do this. The Open Data Hub has been available since 2016 so it is still very young. There may be a future policy to remove data that is considered “old”.
Can we have the datasets setup in a database with relationships defined (EDBMS) that will organise data based on location or transport mode, so we don’t have to connect to multiple datasets\formats?
That’s great feedback and can be a possible option for those who are still seeking ideas for solutions to put in their Application/Entry forms!
Where can I ask further questions?
For questions about our data, please use the Open Data Forum. For questions about this challenge that have not already been answered in these FAQ, please email us at OpenDataProgram@transport.nsw.gov.au.
Can we have a list of the current extensions as well as access to a development environment?
No development environment will be provided for integration, however you are able to download and host your own version of DKAN via https://github.com/GetDKAN/dkan Note The Transport for NSW Open Data Hub is currently using DKAN 7.x-1.18.12.
The following extensions have been included in the TfNSW Open Data Hub environment:
Better Exposed Filters (better_exposed_filters)
BUEditor Plus (bueditor_plus)
Chaos tools (ctools)
CKAN Schema (open_data_schema_ckan)
Collapsible region and pane style (panels_style_collapsible)
Color Field (color_field)
Contextual links (contextual)
Current Search Blocks (current_search)
Database search (search_api_db)
Date API (date_api)
Date Popup (date_popup)
Date Views (date_views)
DCAT Schema (open_data_schema_dcat)
DKAN Data Dashboard (dkan_data_dashboard)
DKAN Data Story (dkan_data_story)
DKAN Dataset (dkan_dataset)
DKAN Dataset Content Types (dkan_dataset_content_types)
DKAN Dataset REST API (dkan_dataset_rest_api)
DKAN Dataset Search (dkan_dataset_search)
DKAN Datastore (dkan_datastore)
DKAN Datastore API (dkan_datastore_api)
DKAN Datastore Fast Import (dkan_datastore_fast_import)
DKAN Datastore Simple Import (dkan_datastore_simple_import)
DKAN Default Topics (dkan_default_topics)
DKAN Fixtures (dkan_fixtures)
DKAN Harvest (dkan_harvest)
DKAN Harvest Dashboard (dkan_harvest_dashboard)
Dkan Harvest Data Json (dkan_harvest_datajson)
DKAN In-Place Editor (dkan_ipe)
DKAN Link Checker (dkan_linkchecker)
DKAN Migrate Base (dkan_migrate_base)
DKAN Panels (dkan_sitewide_panels)
DKAN Permissions (dkan_permissions)
DKAN Plugins (dkan_plugins)
DKAN Sitewide (dkan_sitewide)
DKAN Sitewide Menu (dkan_sitewide_menu)
DKAN Sitewide Panelizer (dkan_sitewide_panelizer)
DKAN Sitewide Search (dkan_sitewide_search_db)
DKAN Sitewide User (dkan_sitewide_user)
DKAN Topics (dkan_topics)
Dkan Workflow Permissions (dkan_workflow_permissions)
Double field (double_field)
Entity API (entity)
Entity Construction Kit (eck)
Entity Dependency API (entity_dependency)
Entity Path (entity_path)
Entity Reference (entityreference)
Entity Reference View Widget (entityreference_view_widget)
Entity tokens (entity_token)
Facet API (facetapi)
Facet API Bonus (facetapi_bonus)
Facet API Pretty Paths (facetapi_pretty_paths)
Facet Links with Icons (facet_icons)
Fast Token Browser (fast_token_browser)
Features Roles Permissions (features_roles_permissions)
Field Group (field_group)
Field Group Table (field_group_table)
Field Hidden (field_hidden)
Field Permissions (field_permissions)
Field reference delete (field_reference_delete)
Field SQL storage (field_sql_storage)
Field UI (field_ui)
Fieldable Panels Panes (fieldable_panels_panes)
File Entity (file_entity)
File Field Sources (filefield_sources)
Font Icon Select (font_icon_select)
Image URL Formatter (image_url_formatter)
Imagecache Actions (imagecache_actions)
Imagecache Canvas Actions (imagecache_canvasactions)
Insert Block (insert_block)
Job Scheduler (job_scheduler)
jQuery Update (jquery_update)
Leaflet Widget for Geofield (leaflet_widget)
Manual Crop (manualcrop)
Markdown Editor for BUEditor (markdowneditor)
Markdown filter (markdown)
Media Internet Sources (media_internet)
Media WYSIWYG (media_wysiwyg)
Media: Vimeo (media_vimeo)
Media: YouTube (media_youtube)
Memcache Storage (memcache_storage)
Menu Block (menu_block)
Menu HTML (menu_html)
Menu Token (menu_token)
Modules Weight (modules_weight)
Open Data Schema Map (open_data_schema_map)
Open Data Schema Map DKAN (open_data_schema_map_dkan)
Page manager (page_manager)
Panels In-Place Editor (panels_ipe)
Panopoly Images (panopoly_images)
Panopoly Widgets (panopoly_widgets)
Path Breadcrumbs (path_breadcrumbs)
Path Breadcrumbs UI (path_breadcrumbs_ui)
POD Schema (open_data_schema_pod)
Project Open Data Schema XML Output (open_data_schema_map_xml_output)
Radix Layouts (radix_layouts)
Recline.js Field (recline)
Reference Field Synchronization (for entityreference) (ref_field_sync)
Remote stream wrapper (remote_stream_wrapper)
REST Server (rest_server)
RESTful web services (restws)
Rules UI (rules_admin)
Search API (search_api)
Search Facets (search_api_facetapi)
Search Views (search_api_views)
Select (or other) (select_or_other)
Services Raw Response Formatter (services_rrf)
Simple Google Maps (simple_gmap)
String Overrides (stringoverrides)
Taxonomy Fixtures (taxonomy_fixtures)
Taxonomy menu (taxonomy_menu)
Token tweaks (token_tweaks)
Universally Unique ID (uuid)
Update manager (update)
Views Aggregator Plus (views_aggregator)
Views Autocomplete Filters (views_autocomplete_filters)
Views Bulk Operations (views_bulk_operations)
Views content panes (views_content)
Views Data Export (views_data_export)
Views JSON (views_json)
Views Reference Filter (entityreference_filter)
Views Responsive Grid (views_responsive_grid)
Views UI (views_ui)
Visualization Entity (visualization_entity)
Visualization Entity Charts (visualization_entity_charts)
Visualization Entity Charts DKAN (visualization_entity_charts_dkan)
Visualization Entity Embed (visualization_entity_embed)
Visualization Entity Recline Field Reference (visualization_entity_recline_field_reference)