## Description of changes
Closes#893
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Adds support for pydantic v2.0 by changing how Collection model inits
- this simple change fixes pydantic v2
- Fixes the cross version tests to handle pydantic specifically
- Conditionally imports pydantic-settings based on what is available. In
v2 BaseSettings was moved to a new package.
- New functionality
- N/A
## Test plan
Existing tests were run with the following configs
1. Fastapi < 0.100, Pydantic >= 2.0 - Unsupported as the fastapi
dependencies will not allow it. They likely should, as pydantic.v1
imports would support this, but this is a downstream issue.
2. Fastapi >= 0.100, Pydantic >= 2.0, Supported via normal imports ✅
(Tested with fastapi==0.103.1, pydantic==2.3.0)
3. Fastapi < 0.100 Pydantic < 2.0, Supported via normal imports ✅
(Tested with fastapi==0.95.2, pydantic==1.9.2)
4. Fastapi >= 0.100, Pydantic < 2.0, Supported via normal imports ✅
(Tested with latest fastapi, pydantic==1.9.2)
- [x] Tests pass locally with `pytest` for python, `yarn test` for js
## Documentation Changes
None required.
## Description of changes
This PR accomplishes two things:
- Adds batching to metrics to decrease load to Posthog
- Adds more metric instrumentation
Each `TelemetryEvent` type now has a `batch_size` member defining how
many of that Event to include in a batch. `TelemetryEvent`s with
`batch_size > 1` must also define `can_batch()` and `batch()` methods to
do the actual batching -- our posthog client can't do this itself since
different `TelemetryEvent`s use different count fields. The Posthog
client combines events until they hit their `batch_size` then fires them
off as one event.
NB: this means we can drop up to `batch_size` events -- since we only
batch `add()` calls right now this seems fine, though we may want to
address it in the future.
As for the additional telemetry, I pretty much copied Anton's draft
https://github.com/chroma-core/chroma/pull/859 with some minor changes.
Other considerations: Maybe we should implement `can_batch()` and
`batch()` on all events, even those which don't currently use them? I'd
prefer not to leave dead code hanging around but happy to go either way.
I created a ticket for the type ignores:
https://github.com/chroma-core/chroma/issues/1169
## Test plan
pytest passes modulo a couple unrelated failures
With `print(event.properties)` in posthog client's `_direct_capture()`:
```
>>> import chromadb
>>> client = chromadb.Client()
{'batch_size': 1}
>>> collection = client.create_collection("sample_collection")
{'batch_size': 1, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'embedding_function': 'ONNXMiniLM_L6_V2'}
>>> collection.add(
... documents=["This is document1", "This is document2"], # we embed for you, or bring your own
... metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on arbitrary metadata!
... ids=["doc1", "doc2"], # must be unique for each doc
... )
{'batch_size': 1, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 2, 'with_documents': 2, 'with_metadata': 2}
>>> for i in range(50):
... collection.add(documents=[str(i)], ids=[str(i)])
...
{'batch_size': 20, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 20, 'with_documents': 20, 'with_metadata': 0}
{'batch_size': 20, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 20, 'with_documents': 20, 'with_metadata': 0}
>>> for i in range(50):
... collection.add(documents=[str(i) + ' ' + str(n) for n in range(20)], ids=[str(i) + ' ' + str(n) for n in range(20)])
...
{'batch_size': 20, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 210, 'with_documents': 210, 'with_metadata': 0}
{'batch_size': 20, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 400, 'with_documents': 400, 'with_metadata': 0}
{'batch_size': 20, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 400, 'with_documents': 400, 'with_metadata': 0}
```
## Documentation Changes
https://github.com/chroma-core/docs/pull/139a4fd57d4d2
## Description of changes
`lifecycle` blocks don't allow variables. Right now our example
deployment for AWS doesn't work. @tazarov has a fix for this and a few
other things in https://github.com/chroma-core/chroma/pull/1173 but I'd
like to get the basic fix out before the weekend.
## Test plan
local `terraform init` failed before, now works.
## Documentation Changes
*Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
repository](https://github.com/chroma-core/docs)?*
## Description of changes
The `IncludeEnum` enum is not exported, cause lint errors when using
`.get` or `.query`, as follows:
```js
const result = await collection.query({
queryTexts: [query],
// THIS LINE WILL PRODUCE LINT ERROR as it needs IncludeEnum.Distances etc.
include: ['distances', 'documents', 'metadatas'],
nResults: 2,
})
```
## Test plan
Nil
## Documentation Changes
Nil
Refs: #1105
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- JS Client support for $in and $nin
## Test plan
*How are these changes tested?*
- [x] Tests pass locally `yarn test` for js
## Documentation Changes
TBD
## Description of changes
*Summarize the changes made by this PR.*
- New functionality
- Adds a basic pulsar producer, consumer and associated tests. As well
as a docker compose for the distributed version of chroma.
## Test plan
We added bin/cluster-test.sh, which starts pulsar and allows
test_producer_consumer to run the pulsar fixture.
## Documentation Changes
None required.
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Initial CLI PR (https://github.com/chroma-core/chroma/pull/1032) moved
the logging config inside chromadb. If image is built with the current
setup it will result in Error: Invalid value for '--log-config': Path
'log_config.yml' does not exist.
## Test plan
*How are these changes tested?*
Steps to reproduce (prior to this PR):
- `docker build -t chroma:canary .`
- `docker run --rm -it chroma:canary`
## Documentation Changes
*Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
repository](https://github.com/chroma-core/docs)?*
Refs: #1104
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Expanding Cohere version also to support 6.x This plays nice with the
rest of the ecosystem
## Test plan
*How are these changes tested?*
- [x] `yarn test` for js
## Documentation Changes
N/A
Refs: #989
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- When the BF index overflows (batch_size upon insertion of large batch
it is cleared, if a subsequent delete request comes to delete Ids which
were in the cleared BF index a warning is raised for non-existent
embedding. The issue was resolved by separately checking if BF the
record exists in the BF index and conditionally execute the BF removal
## Test plan
*How are these changes tested?*
- [x] Tests pass locally with `pytest` for python
## Documentation Changes
N/A
Refs: #1104
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Removed transformers and web-ai peer dependencies.
## Test plan
*How are these changes tested?*
- [ ] Manual testing - `mkdir testproject && cd testproject && npm init
-y && npm link chromadb && npm add langchain`
## Documentation Changes
N/A
- Including only CIP for review.
Refs: #1049
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- New proposal to handle large batches of embeddings gracefully
## Test plan
*How are these changes tested?*
- [ ] Tests pass locally with `pytest` for python, `yarn test` for js
## Documentation Changes
TBD
---------
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
Co-authored-by: Sunil Kumar Dash <47926185+sunilkumardash9@users.noreply.github.com>
Refs: #1083
## Description of changes
*Summarize the changes made by this PR.*
- New functionality
- JS Client now supports Authorization, and X-Chroma-Token auths
supported
- Tests and integration tests updated
## Test plan
*How are these changes tested?*
- [x] Tests pass locally `yarn test` for js
## Documentation Changes
TBD
## Description of changes
- Improvements
- simplified ut-s
- cleaned up a typing import
## Test plan
- [+] Tests passed successfully locally with `pytest` for python, `yarn
test` for js
## Documentation Changes
N/A
## Description of changes
*Summarize the changes made by this PR.*
- Added a workflow_dispatch to manually trigger test workflows
- will be good for development experience
---------
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Added bandit scanning for all pushes to repo
## Test plan
*How are these changes tested?*
Manual testing of the workflow
## Documentation Changes
N/A - unless we want to start a separate security section in the main
docs repo.
---------
Co-authored-by: Hammad Bashir <HammadB@users.noreply.github.com>
This proposes an initial CLI
The CLI is installed when you installed `pip install chromadb`.
You then call the CLI with
`chroma run --path <persist_dir> --port <port>` where path and port are
optional.
This also adds `chroma help` and `chroma docs` as convenience links -
but I'm open to removing those.
To make this easy - I added `typer` (by the author of FastAPI). I'm not
sure this is the tool that we want to commit to for a fuller featured
CLI, but given the extremely minimal footprint of this - I don't think
it's a one way door.
<img width="1477" alt="Screenshot 2023-08-23 at 4 59 54 PM"
src="https://github.com/chroma-core/chroma/assets/891664/30374228-d303-41e1-8e9e-188b7f8532d4">
***
#### TODO
- [x] test in fresh env - i think i need to add `typer` as a req
- [ ] consider expanding the test to make sure the service is actually
running
- [x] hide the test option from the typer UI
- [x] linking to a getting started guide could be interesting at the top
of the logs
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Bump HNSWlib to latest version that has precompiled binaries. Use
alpha release for CI tests before releasing
## Test plan
Existing tests should over functionality. Build compatibility of the
binaries was manually verified.
- [x] Tests pass locally with `pytest` for python, `yarn test` for js
## Documentation Changes
We should add how to force recompiling with AVX to the docs.
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Added external volume for Chroma data
- Bumped to the latest version (0.4.9)
- Added auth by default
- Made the template more configurable via variables with sensible
defaults
## Test plan
*How are these changes tested?*
- Tested with terraform
## Documentation Changes
The update contains README with docs.
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Added additional validations to URLs - URLs like api-gw.aws.com/dev
will now trigger an error asking the user to correctly specify the URL
with http or https
- When the full URL (http(s)://example.com) is provided by the user, the
port parameter is ignored (debug message is logged). An assumption is
made that the URL is entirely defined, thus not requiring additional
alterations such as injecting the port.
- Added negative test cases for invalid URLs
## Test plan
*How are these changes tested?*
- [x] Tests pass locally with `pytest` for python
## Documentation Changes
TBD
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Fixed an issue where list values for In/Nin that are not wrapped with
pypika ParameterValue will result in floating point comparisons failure
after a certain precision threshold.
## Test plan
*How are these changes tested?*
- [x] Tests pass locally with `pytest` for python
## Documentation Changes
N/A
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Pushes images to dockerhub
## Test plan
*How are these changes tested?*
Will have to be tested on main as part of CI/CD
- [x] Tests pass locally with `pytest` for python, `yarn test` for js
## Documentation Changes
None required.
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Add npm run db:run to run docker-compose up
- New functionality
- ...
## Test plan
Will be tested manually. This is an iteration on #1095 .
- [x] Tests pass locally with `pytest` for python, `yarn test` for js
## Documentation Changes
None
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Prevent python release from running on js tag pushes
## Test plan
Manually on main via triggering
## Documentation Changes
None
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Add working directory to npm run js
- New functionality
- ...
## Test plan
These must be tested after merge, this is an iteration on #1091
- [x] Tests pass locally with `pytest` for python, `yarn test` for js
## Documentation Changes
None required.
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Adds release automation for JS client.
## Test plan
They will have to be tested after merging this PR
## Documentation Changes
Update the js client Develop guide for how to use this automation.
## Description of changes
*Summarize the changes made by this PR.*
- Improvements & Bug fixes
- Add explicit env for persist directory defaults to docker compose so
that it can hint at usage
- New functionality
- ...
## Test plan
Existing tests
- [x] Tests pass locally with `pytest` for python, `yarn test` for js
## Documentation Changes
None required