Documentation
The Lumen API allows 3rd parties to search the database and, if the 3rd party has an submitter authentication token, submit new notices to Lumen Database.
Searches are disabled unless you have an authentication token. Please see Authentication for information about searching with an authentication token.
The sample code below is provided for your convenience but will need to be modified for your use case. Unfortunately Lumen is unable to offer programming assistance.
Authentication
Requests for notice data or searches through the API are not permitted without an Authentication Token. Submissions of data or notice creation through the API must be authenticated by including an Authentication Token.
With an authentication token, data requests are still throttled, but at a much higher limit (approximately one request per second). This limit was chosen to protect server resources while rarely or never interfering with most researchers' uses. However, if you are making a large number of automated requests, please sleep
for at least one second between requests to avoid hitting the cap.
The Authentication Token may be included as a URL Parameter, an HTTP Request Header, or (when submitting data) as form data or JSON data.
Examples: Getting Data from Lumen
URL Parameter:
authentication_token
HTTP Request Header:
X-Authentication-Token
Examples: Sending Data to Lumen
URL Parameter:
authentication_token
Request Header:
X-Authentication-Token
Troubleshooting
Did you get an HTTP 429 (Too Many Requests) response and a message saying you were browsing too fast?
Your API token is either missing or incorrectly applied. Check your request against the examples above.
Did you get a 'command not found' error using curl on the command line?
You may have to put the URL in quotes if it contains special characters.
Did you get an HTTP 401 (Unauthorized) response?
Your API token is either missing or incorrectly applied. Check your request against the examples above.
Getting an API token
Email team@lumendatabase.org
and describe your use case. Lumen API tokens are intended for research use only; please read the API Terms of Use before requesting a token.
Request a Notice
Method: GET
Endpoint: https://lumendatabase.org/notices/<notice id>.json
Example Request
⚠️ Note: a custom user agent is required. Default user agents such as "curl" are blocked by our servers.
You may have to put the URL in quotes if it contains special characters.
Successful Responses
Return a JSON-encoded representation of selected notice attributes. Notice Types will have mapped attributes applied, and be under a root key articulating their type.
Example Successful Response
In cases where infringing or copyrighted URLs were not submitted to Lumen, the associated field will say [{ "url": "No URL submitted" }]
.
Unsuccessful Responses
Return a 404 HTTP status header.
Request a list of Topics
Method: GET
Endpoint: https://lumendatabase.org/topics.json
Successful Responses
Return a JSON-encoded array of topics, including the following attributes:
id
Integer
The unique ID used for the topic_ids
array during notice creation.
name
String
The topic name
parent_id
Integer
The parent topic_id
of this topic, or "null" if this is a root topic.
Example Successful Response
Search notices via fulltext
The Lumen website allows for paginatable full-text searches of notices and relevant metadata. Results are sorted with the most relevant at the top. Notice search results contain the same data as an individually-requested Notice, with an additional "score" field. "score" articulates how "relevant" this result is to the query term. Higher numbers are more relevant.
With an API token, results can be extensive; consider requesting gzipped content (if using curl
, add -H "Accept-Encoding: gzip"
).
Lumen is unable to return past the 10,000th result due to limitations in Elasticsearch. This means that querying deeper than page=1000
(with the default per_page
of 10) will fail.
Terms are joined with an 'OR' by default.
Method: GET
Endpoint: https://Lumendatabase.org/notices/search?term=url_escaped_query&sender_name=Joe%20Smith&page=1
Parameters
term
The full-text query term. To perform exact searches, enclose your search term in double quotes (""). This ensures that the search engine looks for the exact phrase as it is written. Searching for "full-text query"
will return results containing precisely this phrase.
term-require-all
If present, all words in the term
query are required for a notice to be considered a match.
title
Search in the title
field
title-require-all
If present, all words in the title
query are required for a notice to be considered a match.
topics
Search within a notice's topics
topics-require-all
If present, all words in the topics
query are required for a notice to be considered a match.
tags
Search within a notice's tags
tags-require-all
If present, all words in the tags
query are required for a notice to be considered a match.
jurisdictions
Search within a notice's jurisdictions
jurisdictions-require-all
If present, all words in the jurisdictions
query are required for a notice to be considered a match.
sender_name
Search in the sender's name
sender_name-require-all
If present, all words in the sender_name
query are required for a notice to be considered a match.
principal_name
Search in the principal's name
principal_name-require-all
If present, all words in the principal_name
query are required for a notice to be considered a match.
recipient_name
Search in the recipient's name
recipient_name-require-all
If present, all words in the recipient_name
query are required for a notice to be considered a match.
works
Search within a work's description
works-require-all
If present, all words in the works
query are required for a notice to be considered a match.
action_taken
Search based on the action taken on a notice.
entities_country_codes
Search within country codes of all notice's entities
topic_facet
Filter on topics facet
sender_name_facet
Filter on sender_name
facet
principal_name_facet
Filter on principal_name
facet
recipient_name_facet
Filter on recipient_name
facet
tag_list_facet
Filter on a tag
country_code_facet
Filter on the submitter's country code
language_facet
Filter on the notice language code
action_taken_facet
Filter on the action_taken
facet
date_received_facet
date_submitted
page
The page you're requesting - defaults to the first page of results.
per_page
The number of results per page. Defaults to 10.
sort_by
One of date_received asc
, date_received desc
, relevancy asc
, or relevancy desc
. Defaults to relevancy asc
.
*-require-all
Parameters
*-require-all
ParametersLet's say we're searching for notices with "George Jetson" in the title. By default, the search engine will search for notices with "George OR Jetson".
If you include 'title-require-all=yes' in your query, then the search engine will search for notices with "George AND Jetson" in the title, narrowing down your results considerably.
The various *-require-all
parameters need a non-null value to be enabled - "true" or "yes" are both acceptable.
Facets
See below for more information about available facets. You can get an idea of how facets are formatted by submitting facet-less fulltext queries and then inspecting the Facets metadata returned by the search.
Troubleshooting very large searches
Searches with a very large number of results (e.g. "DMCA") may fail. If this happens, you will see the URL for your search in the address bar, but a "REQUEST REFUSED (500)" error in the page.
Fix this by narrowing your search. For instance, add a date facet to the end of the URL, such as &date_received_facet=1586404800000.0..1602216000000.0
. This example yields results from April 9, 2020, through October 9, 2020. See Epoch format below for how to generate good numbers for this facet, or simply append this example to your URL and you should be taken back to a search results page which you can then tweak as you wish.
Successful Responses
Return a JSON-encoded hash including an array of notices and metadata about the search results.
notices
Array
An array of Notices encoded as JSON data structures.
meta
Hash
Search Metadata about the results of the search. See Search Metadata.
Search Metadata
query
Query
The search query meta information. See Query.
facets
Facet
How the result set "falls" into metadata facets. See Facets
current_page
Integer
The page number of the current set of results.
next_page
Integer
The page number of the next set of results, or null
if this is the last page.
offset
Integer
How many total results in the result set before the current list of results.
per_page
Integer
Number of results per page
previous_page
Integer
The page number of the previous set of results, or null
if this is the first page.
total_entries
Integer
The total number of results for the search query.
total_pages
Integer
The total number of pages in this result set.
Query
term
String
The full-text search term
sender_name_facet
String
The sender_name
value if this facet was submitted in this search.
recipient_name_facet
String
The recipient_name
value if this facet was submitted in this search.
topic_facet
String
The topic
value if this facet was submitted in this search.
date_received_facet
String
tag_list_facet
String
The tag_list
value if this facet was submitted in this search.
country_code_facet
String
The country_code
value if this facet was submitted in this search.
language_facet
String
The language
value if this facet was submitted in this search.
Facet
Facets aggregate documents along specific metadata. See the elasticsearch documentation for more information on facets.
Currently we're using value and range facets.
sender_name_facet
Terms
The top 10 sender_names
in this result set.
recipient_name_facet
Terms
The top 10 recipient_names
in this result set.
topic_facet
Terms
The top 10 topics
in this result set.
tag_list_facet
Terms
The top 10 tags
in this result set.
country_code_facet
Terms
The top 10 country_codes
in this result set.
language_facet
Terms
The top 10 languages
in this result set.
date_received
Range
The available date range facets.
Terms Facet
Only the most relevant metadata is described below. See the elasticsearch documentation for more information. We only return the 10 most populous facets.
total
Integer
The total number of results that can be faceted on this term and query.
other
Integer
The number of results not included in the facets returned.
terms
Array
An array including the term and count of items that are in that term facet.
Range Facet
from
Integer
to
Integer
from_str
String
A textual representation of the start of this facet.
to_str
String
A textual representation of the end of this facet.
count
Integer
The number of documents that fall in this facet.
Example Successful Response
Note: Some facet information has been omitted to keep examples brief.
Example Unsuccessful Response
Search for Entities via fulltext
The Lumen database allows for paginatable full-text searches of Entities, useful to select existing entities during notice creation. This method is only available via the API, not the web site. For information about pagination metadata, please see Search notices via fulltext.
Method: GET
Endpoint: https://lumendatabase.org/entities/search.json?term=url_escaped_query&page=1
Parameters
term
The full-text query term
page
The page you're requesting - defaults to the first page of results.
per_page
The number of results per page. Defaults to 10.
Successful Responses
Return a JSON-encoded hash including an array of entities and metadata about the search results.
entities
Array
An array of Entities encoded as JSON data structures.
meta
Hash
Search Metadata about the results of the search. See Search Metadata.
Entity
id
string
The unique ID
parent_id
string
The parent ID or "null" if this is a "root" entity.
name
string
Full name
address_line_1
string
address_line_2
string
state
string
country_code
string
Ideally, an ISO country code.
phone
string
email
string
url
string
city
string
Example Successful Response
Example Unsuccessful Response
Create notice
Submits a new Notice to the Lumen system.
There are several different "Notice Types"; these are essentially subclasses of the abstract Notice class. "Notice Types" allow us to track notice-specific attributes and create serialized representations with logical attribute names. Please see Notice Type Mapping, which defines what attributes are remapped for each Notice Type.
An authentication token is required to submit via the API. Please ask Lumen staff for your authentication token. Use of the the authentication token is described in the Authentication section.
Method: POST
Endpoint: https://lumendatabase.org/notices
Request
notice
Notice
Required
authentication_token
string
Notice
Note: The notice types must be sent exactly as written here, i.e., CamelCase with no spaces except DMCA which is all-caps.
title
string
Required
type
string
Required: one of 'Counternotice', 'CourtOrder', 'DataProtection', 'Defamation', 'DMCA', 'LawEnforcementRequest', 'Other', 'PrivateInformation', 'GovernmentRequest', or 'Trademark'
subject
string
Optional. A short description of the notice and its contents. E.g. "DMCA Notice Regarding Photographs" or "Court Order From Paris Court Re: Hate Speech"
body
string
Optional. Use this field to submit any additional text provided by the complainant that does not belong properly in any other notice field. No sensitive information or PII should be included here. E.g., Phone numbers, street addresses, allegedly defamatory language, etc.
date_sent
string
Any parseable time value, e.g. "2013-05-21", "2013-05-21 10:01:01 -04:00"
language
string
A two character language code. See "Language" below for the list.
date_received
string
Any parseable time value, e.g. "2013-05-21", "2013-05-21 10:01:01 -04:00"
source
string
How did you receive this notice - mail? online form?
topic_ids
[int]
tag_list
string
Comma separated tags, spaces are OK. Automatically lowercased.
regulation_list
string
Comma separated regulations / laws relevant to this notice, spaces are OK. Automatically lowercased. Only available for CourtOrder, and LawEnforcementRequest notice types.
jurisidiction_list
string
Comma separated list of jurisdictions, spaces are OK.
action_taken
string
One of 'Yes', 'No', or 'Partial'.
url_count
string
This field is used to indicate the total number of URLs contained in a particular Data Protection request. I.e. If the requester asked for the removal of 10 URLs, this should be set to "10". If no value is set, this will display as "unspecified".
request_type
string
One of 'Agency', 'Civil Subpoena', 'Email', 'Records Preservation', 'Subpoena', 'Warrant'. Valid only for LawEnforcementRequest notices.
mark_registration_number
string
A mark registration number. Valid only for Trademark notices.
works_attributes
[Work]
A list of Works
entity_notice_roles_attributes
[EntityNoticeRole]
A list of EntityNoticeRoles
file_uploads_attributes
[FileUpload]
A list of FileUploads
case_id_number
int
Optional, a court case number, specific to the CourtOrder
type
Work
copyrighted_urls_attributes
[CopyrightedUrl]
List of URLs that represent the original work.
kind
string
Required. Book, movie, video, etc.
description
string
Description of the work, which may include the copyright holder information
infringing_urls_attributes
[InfringingUrl]
List of URLs which infringe on the work
CopyrightedURL
url
string
A URL that represents the original work. 8 kilobyte limit.
InfringingUrl
url
string
A URL that infringes upon the Work. 8 kilobyte limit.
EntityNoticeRole
name
string
The name of the role, one of "principal", "agent", "recipient", "sender", "target", "issuing_court", "plaintiff", "defendant" or "submitter". "recipient" and "sender" are displayed for all Notice Types on the Lumen website after automatic redaction.
entity_id
integer
(Optional) The ID of an existing entity. You should not specify the entity_attributes
values below if you include entity_id
here.
entity_attributes
Entity
A new entity that has this role on this notice. If you specify entity_attributes
, you should not specify an entity_id
Note: If not explicitly provided, recipient entities will be assigned based on the user performing the submission (as identified by the authentication token used).
The submitter entity will always
be assigned based on the user performing the submission (as identified by the authentication token used).
Entity
name
string
Required
kind
string
one of "organization" or "individual"
address_line_1
string
address_line_2
string
city
string
state
string
zip
string
country_code
string
A two-digit ISO country code
phone
string
email
string
url
string
FileUpload
File uploading can only be used when submitting as form data. See the Submit Data section.
kind
string
One of "original" or "supporting"
file
string
Path to a file on the client's system.
Language
The currently supported two-digit language codes are based on what Google Translate supports, and are currently:
af
Afrikaans
ar
Arabic
be
Belarusian
bg
Bulgarian
ca
Catalan; Valencian
cs
Czech
cy
Welsh
da
Danish
de
German
el
Greek, Modern
en
English
eo
Esperanto
es
Spanish; Castilian
et
Estonian
fa
Persian
fi
Finnish
fr
French
ga
Irish
gl
Galician
hi
Hindi
hr
Croatian
ht
Haitian; Haitian Creole
hu
Hungarian
id
Indonesian
is
Icelandic
it
Italian
iw
Hebrew
ja
Japanese
ko
Korean
lt
Lithuanian
lv
Latvian
mk
Macedonian
ml
Malayalam
ms
Malay
mt
Maltese
nl
Dutch
no
Norwegian
pl
Polish
pt
Portuguese
ro
Romanian
ru
Russian
si
Sinhala
sk
Slovak
sl
Slovene
sq
Albanian
sr
Serbian
sv
Swedish
sw
Swahili
th
Thai
tl
Tagalog
tr
Turkish
uk
Ukrainian
vi
Vietnamese
yi
Yiddish
yo
Yoruba
zh
Chinese
Submit Data
When creating a notice through the API, there are two ways to submit notice data: as form data or as JSON data. Each has advantages and disadvantages described below.
Data submission must include an authentication token (see Authentication).
As Form Data
Form data is useful for attaching files to the notice as the files can be included by name and location on a local drive.
This is the suggested method as it is more efficient at attaching files, which are transmitted as binary data.
Sample using curl
In curl, use the @ symbol to specify a local file.
As JSON Data
JSON data is better if you do not have to submit file attachments as the syntax is simpler.
Example Requests
Example Request Using Entity Attributes
As Form Data
As JSON Data
Example Request Using Entity ID
As Form Data
As JSON data
Example Responses
Successful Responses
Successful responses will have HTTP status 201 (Created) and include a Location:
HTTP header with the location of the created object.
Note: Unimportant headers have been removed.
Unsuccessful Responses
Most unsuccessful responses will have HTTP status 422 (Unprocessable Entity) and include a JSON response enumerating the validation failures.
If requests are too fast or too numerous, various other HTTP error codes may appear. If these are persistent, feel free to contact us and we will look into it. Be sure to include your IP address, user agent, and API key, if any.
Notice Type Mapping
All notices are added to the system through the API using the same named parameters (see Create Notice). However, fields are "mapped" depending on the Notice Type when serialized as JSON through a search request or when requested individually.
Counternotice
Attributes remain unchanged from ingestion.
CourtOrder
body
explanation
regulations
laws_referenced
works.description
works.subject_of_court_order
works.infringing_urls
works.targetted_urls
works.copyrighted_urls
Yes
DataProtection
body
legal_complaint
works.description
works.complaint
works.infringing_urls
works.urls_mentioned_in_request
works.copyrighted_urls
Yes
Defamation
body
legal_complaint
works.infringing_urls
works.defamatory_urls
works.copyrighted_urls
Yes
DMCA
Attributes remain unchanged from ingestion.
GovernmentRequest
body
explanation
works.description
works.subject
works.infringing_urls
works.urls_of_original_work
works.copyrighted_urls
works.urls_mentioned_in_request
LawEnforcementRequest
body
explanation
works.description
works.subject_of_enforcement_request
works.infringing_urls
works.urls_in_request
works.copyrighted_urls
works.original_work_urls
regulation_list
regulations
Other
body
explanation
works.description
works.complaint
works.infringing_urls
works.problematic_urls
works.copyrighted_urls
works.original_work_urls
PrivateInformation
body
explanation
works.description
works.complaint
works.infringing_urls
works.urls_with_private_information
works.copyrighted_urls
Yes
Trademark
mark_registration_number
works.description
marks.description
works.infringing_urls
marks.infringing_urls
works.copyrighted_urls
Yes
Epoch format
In some of the facets we use the Unix epoch time format (https://en.wikipedia.org/wiki/Unix_time) in milliseconds.
An easy way to convert time from a human-readable format to epoch is using the https://www.epochconverter.com website. Use the converter to convert it to milliseconds.
Tuesday, 1 January 2019 12:00:00
= 1546344000000 milliseconds
.
Scripting
Lumen Database does not offer packaged downloads of data but we also do not forbid researchers from writing scripts to perform searches or retrieve notice data. However, we do have a few suggestions that will help keep our system responsive and also help us if you run into any issues.
Either request
.json
endpoints or include the "Accept: application/json" header.Consider adding an accept-encoding header to indicate you prefer compressed results (in curl,
-H "Accept-Encoding: gzip"
).Make sure you are using your authentication token to avoid rate limiting.
Understanding the Search Page Format
URL Formats
A search for notices including the term "Lumen" looks like this: https://www.lumendatabase.org/notices/search? &term=Lumen.
For a search that has multiple words, &term-require-all=true
is a handy modifier: lumendatabase.org/notices/search?term=Lumen+Project &term-require-all=true.
To request the search results page in .json format, add the .json
extension to the URL like so: lumendatabase.org/notices/search .json ?term=Lumen+Project&term-require-all=true.
And then to get other search result pages, simply add the &page=
modifier: lumendatabase.org/notices/search.json?sort_by=&term-require-all=true&term=Lumen+Project&utf8=%E2%9C%93 &page=2.
Last updated