-
Notifications
You must be signed in to change notification settings - Fork 234
Description
Initial Checks
- I have read and followed the docs and still think this is a bug
Description
Apologies the title of this is not the best. I have a very odd case and can't seem to understand what is causing it. I have also failed at recreating the issue in a simpler example.
I have a Doc List where each document has been built with the same process however the data is obviously different for each doc. I am using the hnswlib backend.
The issue I have is after I built the doc list with no issues I then try to run a .find() on the individual elements of the doc list, some of which fail and some don't. The error I get on some of these can be seen in the traceback below.
Code Snippet:
class AddressDoc(BaseDoc):
ELID: int
FULL_ADDRESS: str
EMBEDDINGS: NdArray[768]
def build_doc_list(data):
st = time.time()
dl = DocList[AddressDoc](
AddressDoc(
ELID=0000000,
FULL_ADDRESS="",
EMBEDDINGS=d["EMBEDDINGS"],
)
for d in data
)
logger.info(f"Doc list created... {time.time()-st}")
return dl
doc_index = HnswDocumentIndex[AddressDoc](work_dir=db_path)
dl = build_doc_list(data)
# This works!
results = doc_index.find(dl[2], search_field="EMBEDDINGS", limit=1)
# This doesn't!
results = doc_index.find(dl[3], search_field="EMBEDDINGS", limit=1)
type(dl[2].EMBEDDINGS) == type(dl[3].EMBEDDINGS) # returns True
type(dl[2].EMBEDDINGS.shape) == type(dl[3].EMBEDDINGS.shape) # returns TrueI have compared dl[2] and dl[3] left right and center and can't understand what the issue is. The embeddings array in both documents are the same shape which I have checked with numpy (.shape, .ndims, .size). I can't understand what the difference is between the two that causes the error below.
Traceback below:
File /usr/local/lib/python3.11/site-packages/docarray/index/abstract.py:503, in BaseDocIndex.find(self, query, search_field, limit, **kwargs)
[501](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=500) query_vec = query
[502](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=501) query_vec_np = self._to_numpy(query_vec)
--> [503](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=502) docs, scores = self._find(
[504](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=503) query_vec_np, search_field=search_field, limit=limit, **kwargs
[505](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=504) )
[507](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=506) if isinstance(docs, List) and not isinstance(docs, DocList):
[508](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=507) docs = self._dict_list_to_docarray(docs)
File /usr/local/lib/python3.11/site-packages/docarray/index/backends/hnswlib.py:328, in HnswDocumentIndex._find(self, query, limit, search_field)
[324](file:///usr/local/lib/python3.11/site-packages/docarray/index/backends/hnswlib.py?line=323) def _find(
...
--> [197](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=196) return cls._docarray_from_native(x.reshape(source.shape))
[198](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=197) elif len(source.shape) > 0:
[199](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=198) return cls._docarray_from_native(np.zeros(source.shape))
ValueError: cannot reshape array of size 768 into shape (768,768)
Example Code
No response
Python, DocArray & OS Version
0.39.0
Affected Components
Metadata
Metadata
Assignees
Labels
Type
Projects
Status