Monthly Archives: July 2022

Query from Apache Druid using Power Query or CLI

Using Power Query

Data retrieval example from Apache Druid with authentication. Put following snippet into Microsoft Excel Power Query’s Advanced Editor (New Query > From Other Sources > Blank Query):

let
    url = "http://<druid-host>:8888/druid/v2/sql/",
    headers = [#"Content-Type" = "application/json", #"Authorization" = "Basic <druid-credentials-encoded-base64>"],
    postData = Text.ToBinary("{""query"": ""select * from wikipedia limit 10000"",""resultFormat"":""csv"",""header"":""true""}"),
    response = Web.Contents(
        url,
        [
            Headers = headers,
            Content = postData
        ]
    ),
    csvResponse = Csv.Document(response)
in
    csvResponse

<druid-credentials-encoded-base64> is username:password format.

Also note that authentication type on Power Query must be anonymous (default).

Using CLI

Create a new file , for example, named query.json containing following content:

{
  "queryType" : "topN",
  "dataSource" : "wikipedia",
  "intervals" : ["2016-06-27/2016-06-28"],
  "granularity" : "all",
  "dimension" : "page",
  "metric" : "count",
  "threshold" : 10,
  "aggregations" : [
     {
       "type" : "count",
       "name" : "count"
      }
   ]
}

Execute following shell command (curl is required but jq is optional):

$ curl -sS -X 'POST' -H 'Content-Type:application/json' -d @query.json http://<username>:<password>@<druid-host>:8888/druid/v2 | jq